Prediction of bankruptcy is a phenomenon of increasing interest to firms who stand to loose money because on unpaid debts. Since computers can store huge dataset pertaining to bankruptcy making accurate predictions from them before hand is becoming important.
The data were collected from the Taiwan Economic Journal for the years 1999 to 2009. Company bankruptcy was defined based on the business regulations of the Taiwan Stock Exchange.
In this project you will use various classification algorithms on bankruptcy dataset to predict bankruptcies with satisfying accuracies long before the actual event.
Updated column names and description to make the data easier to understand (Y = Output feature, X = Input features)
Y - Bankrupt?: Class label 1 : Yes , 0: No
X1 - ROA(C) before interest and depreciation before interest: Return On Total Assets(C)
X2 - ROA(A) before interest and % after tax: Return On Total Assets(A)
X3 - ROA(B) before interest and depreciation after tax: Return On Total Assets(B)
X4 - Operating Gross Margin: Gross Profit/Net Sales
X5 - Realized Sales Gross Margin: Realized Gross Profit/Net Sales
X6 - Operating Profit Rate: Operating Income/Net Sales
X7 - Pre-tax net Interest Rate: Pre-Tax Income/Net Sales
X8 - After-tax net Interest Rate: Net Income/Net Sales
X9 - Non-industry income and expenditure/revenue: Net Non-operating Income Ratio
X10 - Continuous interest rate (after tax): Net Income-Exclude Disposal Gain or Loss/Net Sales
X11 - Operating Expense Rate: Operating Expenses/Net Sales
X12 - Research and development expense rate: (Research and Development Expenses)/Net Sales
X13 - Cash flow rate: Cash Flow from Operating/Current Liabilities
X14 - Interest-bearing debt interest rate: Interest-bearing Debt/Equity
X15 - Tax rate (A): Effective Tax Rate
X16 - Net Value Per Share (B): Book Value Per Share(B)
X17 - Net Value Per Share (A): Book Value Per Share(A)
X18 - Net Value Per Share (C): Book Value Per Share(C)
X19 - Persistent EPS in the Last Four Seasons: EPS-Net Income
X20 - Cash Flow Per Share
X21 - Revenue Per Share (Yuan ¥): Sales Per Share
X22 - Operating Profit Per Share (Yuan ¥): Operating Income Per Share
X23 - Per Share Net profit before tax (Yuan ¥): Pretax Income Per Share
X24 - Realized Sales Gross Profit Growth Rate
X25 - Operating Profit Growth Rate: Operating Income Growth
X26 - After-tax Net Profit Growth Rate: Net Income Growth
X27 - Regular Net Profit Growth Rate: Continuing Operating Income after Tax Growth
X28 - Continuous Net Profit Growth Rate: Net Income-Excluding Disposal Gain or Loss Growth
X29 - Total Asset Growth Rate: Total Asset Growth
X30 - Net Value Growth Rate: Total Equity Growth
X31 - Total Asset Return Growth Rate Ratio: Return on Total Asset Growth
X32 - Cash Reinvestment %: Cash Reinvestment Ratio
X33 - Current Ratio
X34 - Quick Ratio: Acid Test
X35 - Interest Expense Ratio: Interest Expenses/Total Revenue
X36 - Total debt/Total net worth: Total Liability/Equity Ratio
X37 - Debt ratio %: Liability/Total Assets
X38 - Net worth/Assets: Equity/Total Assets
X39 - Long-term fund suitability ratio (A): (Long-term Liability+Equity)/Fixed Assets
X40 - Borrowing dependency: Cost of Interest-bearing Debt
X41 - Contingent liabilities/Net worth: Contingent Liability/Equity
X42 - Operating profit/Paid-in capital: Operating Income/Capital
X43 - Net profit before tax/Paid-in capital: Pretax Income/Capital
X44 - Inventory and accounts receivable/Net value: (Inventory+Accounts Receivables)/Equity
X45 - Total Asset Turnover
X46 - Accounts Receivable Turnover
X47 - Average Collection Days: Days Receivable Outstanding
X48 - Inventory Turnover Rate (times)
X49 - Fixed Assets Turnover Frequency
X50 - Net Worth Turnover Rate (times): Equity Turnover
X51 - Revenue per person: Sales Per Employee
X52 - Operating profit per person: Operation Income Per Employee
X53 - Allocation rate per person: Fixed Assets Per Employee
X54 - Working Capital to Total Assets
X55 - Quick Assets/Total Assets
X56 - Current Assets/Total Assets
X57 - Cash/Total Assets
X58 - Quick Assets/Current Liability
X59 - Cash/Current Liability
X60 - Current Liability to Assets
X61 - Operating Funds to Liability
X62 - Inventory/Working Capital
X63 - Inventory/Current Liability
X64 - Current Liabilities/Liability
X65 - Working Capital/Equity
X66 - Current Liabilities/Equity
X67 - Long-term Liability to Current Assets
X68 - Retained Earnings to Total Assets
X69 - Total income/Total expense
X70 - Total expense/Assets
X71 - Current Asset Turnover Rate: Current Assets to Sales
X72 - Quick Asset Turnover Rate: Quick Assets to Sales
X73 - Working capitcal Turnover Rate: Working Capital to Sales
X74 - Cash Turnover Rate: Cash to Sales
X75 - Cash Flow to Sales
X76 - Fixed Assets to Assets
X77 - Current Liability to Liability
X78 - Current Liability to Equity
X79 - Equity to Long-term Liability
X80 - Cash Flow to Total Assets
X81 - Cash Flow to Liability
X82 - CFO to Assets
X83 - Cash Flow to Equity
X84 - Current Liability to Current Assets
X85 - Liability-Assets Flag: 1 if Total Liability exceeds Total Assets, 0 otherwise
X86 - Net Income to Total Assets
X87 - Total assets to GNP price
X88 - No-credit Interval
X89 - Gross Profit to Sales
X90 - Net Income to Stockholder's Equity
X91 - Liability to Equity
X92 - Degree of Financial Leverage (DFL)
X93 - Interest Coverage Ratio (Interest expense to EBIT)
X94 - Net Income Flag: 1 if Net Income is Negative for the last two years, 0 otherwise
X95 - Equity to Liability
# importing libraries required for the project
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np # linear algebra
import matplotlib.pyplot as plt # for plotting
import seaborn as sns # for plotting
import warnings # ignore warnings
warnings.filterwarnings('ignore')
from sklearn.preprocessing import StandardScaler,MinMaxScaler # for scaling
from sklearn.model_selection import train_test_split # for splitting the data
# Machine Learning Libraries
from sklearn.linear_model import LogisticRegression
# Model Evaluation Libraries
from sklearn.metrics import confusion_matrix,recall_score, roc_auc_score, roc_curve,f1_score
from sklearn.metrics import accuracy_score,precision_score, classification_report,precision_recall_curve
# Oversampling Libraries
from imblearn.over_sampling import SMOTE,ADASYN,BorderlineSMOTE,KMeansSMOTE,SMOTENC,SVMSMOTE
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier,AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV,RandomizedSearchCV
from lightgbm import LGBMClassifier
from hyperopt import hp,fmin,tpe,STATUS_OK,Trials
from sklearn.model_selection import cross_val_score
Importing Dataset and examining the properties of the dataset
# reading the csv file from pandas library
bank_data=pd.read_csv('COMPANY BANKRUPTCY PREDICTION.csv')
# setting the display option to show all columns
pd.set_option('display.max_columns', None)
# setting the display option to show 100 rows
pd.set_option('display.max_rows', 100)
# Print the first 5 rows of the dataframe
bank_data.head()
| Bankrupt? | ROA(C) before interest and depreciation before interest | ROA(A) before interest and % after tax | ROA(B) before interest and depreciation after tax | Operating Gross Margin | Realized Sales Gross Margin | Operating Profit Rate | Pre-tax net Interest Rate | After-tax net Interest Rate | Non-industry income and expenditure/revenue | Continuous interest rate (after tax) | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Net Value Per Share (A) | Net Value Per Share (C) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Operating Profit Per Share (Yuan ¥) | Per Share Net profit before tax (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Regular Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Operating profit/Paid-in capital | Net profit before tax/Paid-in capital | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Operating Funds to Liability | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Current Liabilities/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Cash Flow to Sales | Fixed Assets to Assets | Current Liability to Liability | Current Liability to Equity | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Net Income to Total Assets | Total assets to GNP price | No-credit Interval | Gross Profit to Sales | Net Income to Stockholder's Equity | Liability to Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Net Income Flag | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0.370594 | 0.424389 | 0.405750 | 0.601457 | 0.601457 | 0.998969 | 0.796887 | 0.808809 | 0.302646 | 0.780985 | 1.256969e-04 | 0.0 | 0.458143 | 0.000725 | 0.0 | 0.147950 | 0.147950 | 0.147950 | 0.169141 | 0.311664 | 0.017560 | 0.095921 | 0.138736 | 0.022102 | 0.848195 | 0.688979 | 0.688979 | 0.217535 | 4.980000e+09 | 0.000327 | 0.263100 | 0.363725 | 0.002259 | 0.001208 | 0.629951 | 0.021266 | 0.207576 | 0.792424 | 0.005024 | 0.390284 | 0.006479 | 0.095885 | 0.137757 | 0.398036 | 0.086957 | 0.001814 | 0.003487 | 1.820926e-04 | 1.165007e-04 | 0.032903 | 0.034164 | 0.392913 | 0.037135 | 0.672775 | 0.166673 | 0.190643 | 0.004094 | 0.001997 | 1.473360e-04 | 0.147308 | 0.334015 | 0.276920 | 0.001036 | 0.676269 | 0.721275 | 0.339077 | 0.025592 | 0.903225 | 0.002022 | 0.064856 | 7.010000e+08 | 6.550000e+09 | 0.593831 | 4.580000e+08 | 0.671568 | 0.424206 | 0.676269 | 0.339077 | 0.126549 | 0.637555 | 0.458609 | 0.520382 | 0.312905 | 0.118250 | 0 | 0.716845 | 0.009219 | 0.622879 | 0.601453 | 0.827890 | 0.290202 | 0.026601 | 0.564050 | 1 | 0.016469 |
| 1 | 1 | 0.464291 | 0.538214 | 0.516730 | 0.610235 | 0.610235 | 0.998946 | 0.797380 | 0.809301 | 0.303556 | 0.781506 | 2.897851e-04 | 0.0 | 0.461867 | 0.000647 | 0.0 | 0.182251 | 0.182251 | 0.182251 | 0.208944 | 0.318137 | 0.021144 | 0.093722 | 0.169918 | 0.022080 | 0.848088 | 0.689693 | 0.689702 | 0.217620 | 6.110000e+09 | 0.000443 | 0.264516 | 0.376709 | 0.006016 | 0.004039 | 0.635172 | 0.012502 | 0.171176 | 0.828824 | 0.005059 | 0.376760 | 0.005835 | 0.093743 | 0.168962 | 0.397725 | 0.064468 | 0.001286 | 0.004917 | 9.360000e+09 | 7.190000e+08 | 0.025484 | 0.006889 | 0.391590 | 0.012335 | 0.751111 | 0.127236 | 0.182419 | 0.014948 | 0.004136 | 1.383910e-03 | 0.056963 | 0.341106 | 0.289642 | 0.005210 | 0.308589 | 0.731975 | 0.329740 | 0.023947 | 0.931065 | 0.002226 | 0.025516 | 1.065198e-04 | 7.700000e+09 | 0.593916 | 2.490000e+09 | 0.671570 | 0.468828 | 0.308589 | 0.329740 | 0.120916 | 0.641100 | 0.459001 | 0.567101 | 0.314163 | 0.047775 | 0 | 0.795297 | 0.008323 | 0.623652 | 0.610237 | 0.839969 | 0.283846 | 0.264577 | 0.570175 | 1 | 0.020794 |
| 2 | 1 | 0.426071 | 0.499019 | 0.472295 | 0.601450 | 0.601364 | 0.998857 | 0.796403 | 0.808388 | 0.302035 | 0.780284 | 2.361297e-04 | 25500000.0 | 0.458521 | 0.000790 | 0.0 | 0.177911 | 0.177911 | 0.193713 | 0.180581 | 0.307102 | 0.005944 | 0.092338 | 0.142803 | 0.022760 | 0.848094 | 0.689463 | 0.689470 | 0.217601 | 7.280000e+09 | 0.000396 | 0.264184 | 0.368913 | 0.011543 | 0.005348 | 0.629631 | 0.021248 | 0.207516 | 0.792484 | 0.005100 | 0.379093 | 0.006562 | 0.092318 | 0.148036 | 0.406580 | 0.014993 | 0.001495 | 0.004227 | 6.500000e+07 | 2.650000e+09 | 0.013387 | 0.028997 | 0.381968 | 0.141016 | 0.829502 | 0.340201 | 0.602806 | 0.000991 | 0.006302 | 5.340000e+09 | 0.098162 | 0.336731 | 0.277456 | 0.013879 | 0.446027 | 0.742729 | 0.334777 | 0.003715 | 0.909903 | 0.002060 | 0.021387 | 1.791094e-03 | 1.022676e-03 | 0.594502 | 7.610000e+08 | 0.671571 | 0.276179 | 0.446027 | 0.334777 | 0.117922 | 0.642765 | 0.459254 | 0.538491 | 0.314515 | 0.025346 | 0 | 0.774670 | 0.040003 | 0.623841 | 0.601449 | 0.836774 | 0.290189 | 0.026555 | 0.563706 | 1 | 0.016474 |
| 3 | 1 | 0.399844 | 0.451265 | 0.457733 | 0.583541 | 0.583541 | 0.998700 | 0.796967 | 0.808966 | 0.303350 | 0.781241 | 1.078888e-04 | 0.0 | 0.465705 | 0.000449 | 0.0 | 0.154187 | 0.154187 | 0.154187 | 0.193722 | 0.321674 | 0.014368 | 0.077762 | 0.148603 | 0.022046 | 0.848005 | 0.689110 | 0.689110 | 0.217568 | 4.880000e+09 | 0.000382 | 0.263371 | 0.384077 | 0.004194 | 0.002896 | 0.630228 | 0.009572 | 0.151465 | 0.848535 | 0.005047 | 0.379743 | 0.005366 | 0.077727 | 0.147561 | 0.397925 | 0.089955 | 0.001966 | 0.003215 | 7.130000e+09 | 9.150000e+09 | 0.028065 | 0.015463 | 0.378497 | 0.021320 | 0.725754 | 0.161575 | 0.225815 | 0.018851 | 0.002961 | 1.010646e-03 | 0.098715 | 0.348716 | 0.276580 | 0.003540 | 0.615848 | 0.729825 | 0.331509 | 0.022165 | 0.906902 | 0.001831 | 0.024161 | 8.140000e+09 | 6.050000e+09 | 0.593889 | 2.030000e+09 | 0.671519 | 0.559144 | 0.615848 | 0.331509 | 0.120760 | 0.579039 | 0.448518 | 0.604105 | 0.302382 | 0.067250 | 0 | 0.739555 | 0.003252 | 0.622929 | 0.583538 | 0.834697 | 0.281721 | 0.026697 | 0.564663 | 1 | 0.023982 |
| 4 | 1 | 0.465022 | 0.538432 | 0.522298 | 0.598783 | 0.598783 | 0.998973 | 0.797366 | 0.809304 | 0.303475 | 0.781550 | 7.890000e+09 | 0.0 | 0.462746 | 0.000686 | 0.0 | 0.167502 | 0.167502 | 0.167502 | 0.212537 | 0.319162 | 0.029690 | 0.096898 | 0.168412 | 0.022096 | 0.848258 | 0.689697 | 0.689697 | 0.217626 | 5.510000e+09 | 0.000439 | 0.265218 | 0.379690 | 0.006022 | 0.003727 | 0.636055 | 0.005150 | 0.106509 | 0.893491 | 0.005303 | 0.375025 | 0.006624 | 0.096927 | 0.167461 | 0.400079 | 0.175412 | 0.001449 | 0.004367 | 1.633674e-04 | 2.935211e-04 | 0.040161 | 0.058111 | 0.394371 | 0.023988 | 0.751822 | 0.260330 | 0.358380 | 0.014161 | 0.004275 | 6.804636e-04 | 0.110195 | 0.344639 | 0.287913 | 0.004869 | 0.975007 | 0.732000 | 0.330726 | 0.000000 | 0.913850 | 0.002224 | 0.026385 | 6.680000e+09 | 5.050000e+09 | 0.593915 | 8.240000e+08 | 0.671563 | 0.309555 | 0.975007 | 0.330726 | 0.110933 | 0.622374 | 0.454411 | 0.578469 | 0.311567 | 0.047725 | 0 | 0.795016 | 0.003878 | 0.623521 | 0.598782 | 0.839973 | 0.278514 | 0.024752 | 0.575617 | 1 | 0.035490 |
# printing the tail of the dataframe
bank_data.tail()
| Bankrupt? | ROA(C) before interest and depreciation before interest | ROA(A) before interest and % after tax | ROA(B) before interest and depreciation after tax | Operating Gross Margin | Realized Sales Gross Margin | Operating Profit Rate | Pre-tax net Interest Rate | After-tax net Interest Rate | Non-industry income and expenditure/revenue | Continuous interest rate (after tax) | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Net Value Per Share (A) | Net Value Per Share (C) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Operating Profit Per Share (Yuan ¥) | Per Share Net profit before tax (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Regular Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Operating profit/Paid-in capital | Net profit before tax/Paid-in capital | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Operating Funds to Liability | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Current Liabilities/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Cash Flow to Sales | Fixed Assets to Assets | Current Liability to Liability | Current Liability to Equity | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Net Income to Total Assets | Total assets to GNP price | No-credit Interval | Gross Profit to Sales | Net Income to Stockholder's Equity | Liability to Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Net Income Flag | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6814 | 0 | 0.493687 | 0.539468 | 0.543230 | 0.604455 | 0.604462 | 0.998992 | 0.797409 | 0.809331 | 0.303510 | 0.781588 | 1.510213e-04 | 4.500000e+09 | 0.463734 | 1.790179e-04 | 0.113372 | 0.175045 | 0.175045 | 0.175045 | 0.216602 | 0.320966 | 0.020766 | 0.098200 | 0.172102 | 0.022374 | 0.848205 | 0.689778 | 0.689778 | 0.217635 | 7.070000e+09 | 0.000450 | 0.264517 | 0.380155 | 0.010451 | 0.005457 | 0.631415 | 0.006655 | 0.124618 | 0.875382 | 0.005150 | 0.373823 | 0.005366 | 0.098222 | 0.171111 | 0.404804 | 0.103448 | 0.000690 | 0.009177 | 4.030000e+07 | 0.000143 | 0.027903 | 0.006348 | 0.392596 | 0.006312 | 0.817769 | 0.312840 | 0.578455 | 0.099481 | 0.005469 | 0.005072 | 0.103838 | 0.346224 | 0.277543 | 0.013212 | 0.786888 | 0.736716 | 0.330914 | 1.792237e-03 | 0.925611 | 0.002266 | 0.019060 | 0.000229 | 0.000124 | 0.593985 | 1.077940e-04 | 0.671570 | 0.400338 | 0.786888 | 0.330914 | 0.112622 | 0.639806 | 0.458639 | 0.587178 | 0.314063 | 0.027951 | 0 | 0.799927 | 0.000466 | 0.623620 | 0.604455 | 0.840359 | 0.279606 | 0.027064 | 0.566193 | 1 | 0.029890 |
| 6815 | 0 | 0.475162 | 0.538269 | 0.524172 | 0.598308 | 0.598308 | 0.998992 | 0.797414 | 0.809327 | 0.303520 | 0.781586 | 5.220000e+09 | 1.440000e+09 | 0.461978 | 2.370237e-04 | 0.371596 | 0.181324 | 0.181324 | 0.181324 | 0.216697 | 0.318278 | 0.023050 | 0.098608 | 0.172780 | 0.022159 | 0.848245 | 0.689734 | 0.689734 | 0.217631 | 5.220000e+09 | 0.000445 | 0.264730 | 0.377389 | 0.009259 | 0.006741 | 0.631489 | 0.004623 | 0.099253 | 0.900747 | 0.006772 | 0.372505 | 0.008619 | 0.098572 | 0.171805 | 0.399926 | 0.103448 | 0.000655 | 0.009652 | 9.940000e+09 | 0.000605 | 0.027419 | 0.016083 | 0.393625 | 0.003401 | 0.793387 | 0.335085 | 0.444043 | 0.080337 | 0.006790 | 0.004727 | 0.089901 | 0.342166 | 0.277368 | 0.006730 | 0.849898 | 0.734584 | 0.329753 | 2.204673e-03 | 0.932629 | 0.002288 | 0.011118 | 0.000152 | 0.000117 | 0.593954 | 7.710000e+09 | 0.671572 | 0.096136 | 0.849898 | 0.329753 | 0.112329 | 0.642072 | 0.459058 | 0.569498 | 0.314446 | 0.031470 | 0 | 0.799748 | 0.001959 | 0.623931 | 0.598306 | 0.840306 | 0.278132 | 0.027009 | 0.566018 | 1 | 0.038284 |
| 6816 | 0 | 0.472725 | 0.533744 | 0.520638 | 0.610444 | 0.610213 | 0.998984 | 0.797401 | 0.809317 | 0.303512 | 0.781546 | 2.509312e-04 | 1.039086e-04 | 0.472189 | 0.000000e+00 | 0.490839 | 0.269521 | 0.269521 | 0.269521 | 0.210929 | 0.324857 | 0.044255 | 0.100073 | 0.173232 | 0.022068 | 0.847978 | 0.689202 | 0.689202 | 0.217547 | 5.990000e+09 | 0.000435 | 0.263858 | 0.379392 | 0.038424 | 0.035112 | 0.630612 | 0.001392 | 0.038939 | 0.961061 | 0.009149 | 0.369637 | 0.005366 | 0.100103 | 0.172287 | 0.395592 | 0.106447 | 0.001510 | 0.004188 | 2.797309e-04 | 0.001024 | 0.022419 | 0.022097 | 0.393693 | 0.002774 | 0.866047 | 0.476747 | 0.496053 | 0.412885 | 0.035531 | 0.088212 | 0.024414 | 0.358847 | 0.277022 | 0.007810 | 0.553964 | 0.737432 | 0.326921 | 0.000000e+00 | 0.932000 | 0.002239 | 0.035446 | 0.000176 | 0.000175 | 0.594025 | 4.074263e-04 | 0.671564 | 0.055509 | 0.553964 | 0.326921 | 0.110933 | 0.631678 | 0.452465 | 0.589341 | 0.313353 | 0.007542 | 0 | 0.797778 | 0.002840 | 0.624156 | 0.610441 | 0.840138 | 0.275789 | 0.026791 | 0.565158 | 1 | 0.097649 |
| 6817 | 0 | 0.506264 | 0.559911 | 0.554045 | 0.607850 | 0.607850 | 0.999074 | 0.797500 | 0.809399 | 0.303498 | 0.781663 | 1.236154e-04 | 2.510000e+09 | 0.476123 | 2.110211e-04 | 0.181294 | 0.213392 | 0.213392 | 0.213392 | 0.228326 | 0.346573 | 0.031535 | 0.111799 | 0.185584 | 0.022350 | 0.854064 | 0.696113 | 0.696113 | 0.218006 | 7.250000e+09 | 0.000529 | 0.264409 | 0.401028 | 0.012782 | 0.007256 | 0.630731 | 0.003816 | 0.086979 | 0.913021 | 0.005529 | 0.369649 | 0.007068 | 0.111722 | 0.182498 | 0.401540 | 0.109445 | 0.000716 | 0.008829 | 4.550000e+09 | 0.000233 | 0.027258 | 0.012749 | 0.396735 | 0.007489 | 0.832340 | 0.353624 | 0.564439 | 0.112238 | 0.007753 | 0.007133 | 0.083199 | 0.380251 | 0.277353 | 0.013334 | 0.893241 | 0.736713 | 0.329294 | 3.200000e+09 | 0.939613 | 0.002395 | 0.016443 | 0.000214 | 0.000135 | 0.593997 | 1.165392e-04 | 0.671606 | 0.246805 | 0.893241 | 0.329294 | 0.110957 | 0.684857 | 0.471313 | 0.678338 | 0.320118 | 0.022916 | 0 | 0.811808 | 0.002837 | 0.623957 | 0.607846 | 0.841084 | 0.277547 | 0.026822 | 0.565302 | 1 | 0.044009 |
| 6818 | 0 | 0.493053 | 0.570105 | 0.549548 | 0.627409 | 0.627409 | 0.998080 | 0.801987 | 0.813800 | 0.313415 | 0.786079 | 1.431695e-03 | 0.000000e+00 | 0.427721 | 5.900000e+08 | 0.000000 | 0.220766 | 0.220766 | 0.220766 | 0.227758 | 0.305793 | 0.000665 | 0.092501 | 0.182119 | 0.025316 | 0.848053 | 0.689527 | 0.689527 | 0.217605 | 9.350000e+09 | 0.000519 | 0.264186 | 0.360102 | 0.051348 | 0.040897 | 0.630618 | 0.000461 | 0.014149 | 0.985851 | 0.058476 | 0.370049 | 0.006368 | 0.092465 | 0.179911 | 0.393883 | 0.002999 | 0.000325 | 0.019474 | 1.910000e+07 | 0.000300 | 0.009194 | 0.002097 | 0.385767 | 0.000963 | 0.873759 | 0.527136 | 0.505010 | 0.238147 | 0.051481 | 0.066674 | 0.018517 | 0.239585 | 0.276975 | 0.000000 | 1.000000 | 0.737286 | 0.326690 | 0.000000e+00 | 0.938005 | 0.002791 | 0.006089 | 0.007864 | 0.008238 | 0.598674 | 9.505992e-03 | 0.672096 | 0.005016 | 1.000000 | 0.326690 | 0.110933 | 0.659917 | 0.483285 | 0.505531 | 0.316238 | 0.005579 | 0 | 0.815956 | 0.000707 | 0.626680 | 0.627408 | 0.841019 | 0.275114 | 0.026793 | 0.565167 | 1 | 0.233902 |
# Checking value counts of the target variable
target_counts=bank_data["Bankrupt?"].value_counts()
print(target_counts)
print(np.round(target_counts[1]/target_counts[0]),3)
0 6599 1 220 Name: Bankrupt?, dtype: int64 0.0 3
sns.countplot(bank_data['Bankrupt?']) # plotting the count plot of target variable
<AxesSubplot:xlabel='Bankrupt?', ylabel='count'>
bank_data.info() # to know the data types of the columns
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6819 entries, 0 to 6818 Data columns (total 96 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Bankrupt? 6819 non-null int64 1 ROA(C) before interest and depreciation before interest 6819 non-null float64 2 ROA(A) before interest and % after tax 6819 non-null float64 3 ROA(B) before interest and depreciation after tax 6819 non-null float64 4 Operating Gross Margin 6819 non-null float64 5 Realized Sales Gross Margin 6819 non-null float64 6 Operating Profit Rate 6819 non-null float64 7 Pre-tax net Interest Rate 6819 non-null float64 8 After-tax net Interest Rate 6819 non-null float64 9 Non-industry income and expenditure/revenue 6819 non-null float64 10 Continuous interest rate (after tax) 6819 non-null float64 11 Operating Expense Rate 6819 non-null float64 12 Research and development expense rate 6819 non-null float64 13 Cash flow rate 6819 non-null float64 14 Interest-bearing debt interest rate 6819 non-null float64 15 Tax rate (A) 6819 non-null float64 16 Net Value Per Share (B) 6819 non-null float64 17 Net Value Per Share (A) 6819 non-null float64 18 Net Value Per Share (C) 6819 non-null float64 19 Persistent EPS in the Last Four Seasons 6819 non-null float64 20 Cash Flow Per Share 6819 non-null float64 21 Revenue Per Share (Yuan ¥) 6819 non-null float64 22 Operating Profit Per Share (Yuan ¥) 6819 non-null float64 23 Per Share Net profit before tax (Yuan ¥) 6819 non-null float64 24 Realized Sales Gross Profit Growth Rate 6819 non-null float64 25 Operating Profit Growth Rate 6819 non-null float64 26 After-tax Net Profit Growth Rate 6819 non-null float64 27 Regular Net Profit Growth Rate 6819 non-null float64 28 Continuous Net Profit Growth Rate 6819 non-null float64 29 Total Asset Growth Rate 6819 non-null float64 30 Net Value Growth Rate 6819 non-null float64 31 Total Asset Return Growth Rate Ratio 6819 non-null float64 32 Cash Reinvestment % 6819 non-null float64 33 Current Ratio 6819 non-null float64 34 Quick Ratio 6819 non-null float64 35 Interest Expense Ratio 6819 non-null float64 36 Total debt/Total net worth 6819 non-null float64 37 Debt ratio % 6819 non-null float64 38 Net worth/Assets 6819 non-null float64 39 Long-term fund suitability ratio (A) 6819 non-null float64 40 Borrowing dependency 6819 non-null float64 41 Contingent liabilities/Net worth 6819 non-null float64 42 Operating profit/Paid-in capital 6819 non-null float64 43 Net profit before tax/Paid-in capital 6819 non-null float64 44 Inventory and accounts receivable/Net value 6819 non-null float64 45 Total Asset Turnover 6819 non-null float64 46 Accounts Receivable Turnover 6819 non-null float64 47 Average Collection Days 6819 non-null float64 48 Inventory Turnover Rate (times) 6819 non-null float64 49 Fixed Assets Turnover Frequency 6819 non-null float64 50 Net Worth Turnover Rate (times) 6819 non-null float64 51 Revenue per person 6819 non-null float64 52 Operating profit per person 6819 non-null float64 53 Allocation rate per person 6819 non-null float64 54 Working Capital to Total Assets 6819 non-null float64 55 Quick Assets/Total Assets 6819 non-null float64 56 Current Assets/Total Assets 6819 non-null float64 57 Cash/Total Assets 6819 non-null float64 58 Quick Assets/Current Liability 6819 non-null float64 59 Cash/Current Liability 6819 non-null float64 60 Current Liability to Assets 6819 non-null float64 61 Operating Funds to Liability 6819 non-null float64 62 Inventory/Working Capital 6819 non-null float64 63 Inventory/Current Liability 6819 non-null float64 64 Current Liabilities/Liability 6819 non-null float64 65 Working Capital/Equity 6819 non-null float64 66 Current Liabilities/Equity 6819 non-null float64 67 Long-term Liability to Current Assets 6819 non-null float64 68 Retained Earnings to Total Assets 6819 non-null float64 69 Total income/Total expense 6819 non-null float64 70 Total expense/Assets 6819 non-null float64 71 Current Asset Turnover Rate 6819 non-null float64 72 Quick Asset Turnover Rate 6819 non-null float64 73 Working capitcal Turnover Rate 6819 non-null float64 74 Cash Turnover Rate 6819 non-null float64 75 Cash Flow to Sales 6819 non-null float64 76 Fixed Assets to Assets 6819 non-null float64 77 Current Liability to Liability 6819 non-null float64 78 Current Liability to Equity 6819 non-null float64 79 Equity to Long-term Liability 6819 non-null float64 80 Cash Flow to Total Assets 6819 non-null float64 81 Cash Flow to Liability 6819 non-null float64 82 CFO to Assets 6819 non-null float64 83 Cash Flow to Equity 6819 non-null float64 84 Current Liability to Current Assets 6819 non-null float64 85 Liability-Assets Flag 6819 non-null int64 86 Net Income to Total Assets 6819 non-null float64 87 Total assets to GNP price 6819 non-null float64 88 No-credit Interval 6819 non-null float64 89 Gross Profit to Sales 6819 non-null float64 90 Net Income to Stockholder's Equity 6819 non-null float64 91 Liability to Equity 6819 non-null float64 92 Degree of Financial Leverage (DFL) 6819 non-null float64 93 Interest Coverage Ratio (Interest expense to EBIT) 6819 non-null float64 94 Net Income Flag 6819 non-null int64 95 Equity to Liability 6819 non-null float64 dtypes: float64(93), int64(3) memory usage: 5.0 MB
There are around 96 columns in the dataset with 93 of them being float and 3 being int.
bank_data.shape # printing the shape of the dataframe
(6819, 96)
Data set has 96 columns and 6819 rows
bank_data.describe()# to know the statistical summary of the dataframe
| Bankrupt? | ROA(C) before interest and depreciation before interest | ROA(A) before interest and % after tax | ROA(B) before interest and depreciation after tax | Operating Gross Margin | Realized Sales Gross Margin | Operating Profit Rate | Pre-tax net Interest Rate | After-tax net Interest Rate | Non-industry income and expenditure/revenue | Continuous interest rate (after tax) | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Net Value Per Share (A) | Net Value Per Share (C) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Operating Profit Per Share (Yuan ¥) | Per Share Net profit before tax (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Regular Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Operating profit/Paid-in capital | Net profit before tax/Paid-in capital | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Operating Funds to Liability | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Current Liabilities/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Cash Flow to Sales | Fixed Assets to Assets | Current Liability to Liability | Current Liability to Equity | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Net Income to Total Assets | Total assets to GNP price | No-credit Interval | Gross Profit to Sales | Net Income to Stockholder's Equity | Liability to Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Net Income Flag | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6.819000e+03 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.000000 | 6819.0 | 6819.000000 |
| mean | 0.032263 | 0.505180 | 0.558625 | 0.553589 | 0.607948 | 0.607929 | 0.998755 | 0.797190 | 0.809084 | 0.303623 | 0.781381 | 1.995347e+09 | 1.950427e+09 | 0.467431 | 1.644801e+07 | 0.115001 | 0.190661 | 0.190633 | 0.190672 | 0.228813 | 0.323482 | 1.328641e+06 | 0.109091 | 0.184361 | 0.022408 | 0.847980 | 0.689146 | 0.689150 | 0.217639 | 5.508097e+09 | 1.566212e+06 | 0.264248 | 0.379677 | 4.032850e+05 | 8.376595e+06 | 0.630991 | 4.416337e+06 | 0.113177 | 0.886823 | 0.008783 | 0.374654 | 0.005968 | 0.108977 | 0.182715 | 0.402459 | 0.141606 | 1.278971e+07 | 9.826221e+06 | 2.149106e+09 | 1.008596e+09 | 0.038595 | 2.325854e+06 | 0.400671 | 1.125579e+07 | 0.814125 | 0.400132 | 0.522273 | 0.124095 | 3.592902e+06 | 3.715999e+07 | 0.090673 | 0.353828 | 0.277395 | 5.580680e+07 | 0.761599 | 0.735817 | 0.331410 | 5.416004e+07 | 0.934733 | 0.002549 | 0.029184 | 1.195856e+09 | 2.163735e+09 | 0.594006 | 2.471977e+09 | 0.671531 | 1.220121e+06 | 0.761599 | 0.331410 | 0.115645 | 0.649731 | 0.461849 | 0.593415 | 0.315582 | 0.031506 | 0.001173 | 0.807760 | 1.862942e+07 | 0.623915 | 0.607946 | 0.840402 | 0.280365 | 0.027541 | 0.565358 | 1.0 | 0.047578 |
| std | 0.176710 | 0.060686 | 0.065620 | 0.061595 | 0.016934 | 0.016916 | 0.013010 | 0.012869 | 0.013601 | 0.011163 | 0.012679 | 3.237684e+09 | 2.598292e+09 | 0.017036 | 1.082750e+08 | 0.138667 | 0.033390 | 0.033474 | 0.033480 | 0.033263 | 0.017611 | 5.170709e+07 | 0.027942 | 0.033180 | 0.012079 | 0.010752 | 0.013853 | 0.013910 | 0.010063 | 2.897718e+09 | 1.141594e+08 | 0.009634 | 0.020737 | 3.330216e+07 | 2.446847e+08 | 0.011238 | 1.684069e+08 | 0.053920 | 0.053920 | 0.028153 | 0.016286 | 0.012188 | 0.027782 | 0.030785 | 0.013324 | 0.101145 | 2.782598e+08 | 2.563589e+08 | 3.247967e+09 | 2.477557e+09 | 0.036680 | 1.366327e+08 | 0.032720 | 2.945063e+08 | 0.059054 | 0.201998 | 0.218112 | 0.139251 | 1.716209e+08 | 5.103509e+08 | 0.050290 | 0.035147 | 0.010469 | 5.820516e+08 | 0.206677 | 0.011678 | 0.013488 | 5.702706e+08 | 0.025564 | 0.012093 | 0.027149 | 2.821161e+09 | 3.374944e+09 | 0.008959 | 2.938623e+09 | 0.009341 | 1.007542e+08 | 0.206677 | 0.013488 | 0.019529 | 0.047372 | 0.029943 | 0.058561 | 0.012961 | 0.030845 | 0.034234 | 0.040332 | 3.764501e+08 | 0.012290 | 0.016934 | 0.014523 | 0.014463 | 0.015668 | 0.013214 | 0.0 | 0.050014 |
| min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.000000 |
| 25% | 0.000000 | 0.476527 | 0.535543 | 0.527277 | 0.600445 | 0.600434 | 0.998969 | 0.797386 | 0.809312 | 0.303466 | 0.781567 | 1.566874e-04 | 1.281880e-04 | 0.461558 | 2.030203e-04 | 0.000000 | 0.173613 | 0.173613 | 0.173676 | 0.214711 | 0.317748 | 1.563138e-02 | 0.096083 | 0.170370 | 0.022065 | 0.847984 | 0.689270 | 0.689270 | 0.217580 | 4.860000e+09 | 4.409689e-04 | 0.263759 | 0.374749 | 7.555047e-03 | 4.725903e-03 | 0.630612 | 3.007049e-03 | 0.072891 | 0.851196 | 0.005244 | 0.370168 | 0.005366 | 0.096105 | 0.169376 | 0.397403 | 0.076462 | 7.101336e-04 | 4.386530e-03 | 1.728256e-04 | 2.330013e-04 | 0.021774 | 1.043285e-02 | 0.392438 | 4.120529e-03 | 0.774309 | 0.241973 | 0.352845 | 0.033543 | 5.239776e-03 | 1.973008e-03 | 0.053301 | 0.341023 | 0.277034 | 3.163148e-03 | 0.626981 | 0.733612 | 0.328096 | 0.000000e+00 | 0.931097 | 0.002236 | 0.014567 | 1.456236e-04 | 1.417149e-04 | 0.593934 | 2.735337e-04 | 0.671565 | 8.536037e-02 | 0.626981 | 0.328096 | 0.110933 | 0.633265 | 0.457116 | 0.565987 | 0.312995 | 0.018034 | 0.000000 | 0.796750 | 9.036205e-04 | 0.623636 | 0.600443 | 0.840115 | 0.276944 | 0.026791 | 0.565158 | 1.0 | 0.024477 |
| 50% | 0.000000 | 0.502706 | 0.559802 | 0.552278 | 0.605997 | 0.605976 | 0.999022 | 0.797464 | 0.809375 | 0.303525 | 0.781635 | 2.777589e-04 | 5.090000e+08 | 0.465080 | 3.210321e-04 | 0.073489 | 0.184400 | 0.184400 | 0.184400 | 0.224544 | 0.322487 | 2.737571e-02 | 0.104226 | 0.179709 | 0.022102 | 0.848044 | 0.689439 | 0.689439 | 0.217598 | 6.400000e+09 | 4.619555e-04 | 0.264050 | 0.380425 | 1.058717e-02 | 7.412472e-03 | 0.630698 | 5.546284e-03 | 0.111407 | 0.888593 | 0.005665 | 0.372624 | 0.005366 | 0.104133 | 0.178456 | 0.400131 | 0.118441 | 9.678107e-04 | 6.572537e-03 | 7.646743e-04 | 5.930942e-04 | 0.029516 | 1.861551e-02 | 0.395898 | 7.844373e-03 | 0.810275 | 0.386451 | 0.514830 | 0.074887 | 7.908898e-03 | 4.903886e-03 | 0.082705 | 0.348597 | 0.277178 | 6.497335e-03 | 0.806881 | 0.736013 | 0.329685 | 1.974619e-03 | 0.937672 | 0.002336 | 0.022674 | 1.987816e-04 | 2.247728e-04 | 0.593963 | 1.080000e+09 | 0.671574 | 1.968810e-01 | 0.806881 | 0.329685 | 0.112340 | 0.645366 | 0.459750 | 0.593266 | 0.314953 | 0.027597 | 0.000000 | 0.810619 | 2.085213e-03 | 0.623879 | 0.605998 | 0.841179 | 0.278778 | 0.026808 | 0.565252 | 1.0 | 0.033798 |
| 75% | 0.000000 | 0.535563 | 0.589157 | 0.584105 | 0.613914 | 0.613842 | 0.999095 | 0.797579 | 0.809469 | 0.303585 | 0.781735 | 4.145000e+09 | 3.450000e+09 | 0.471004 | 5.325533e-04 | 0.205841 | 0.199570 | 0.199570 | 0.199612 | 0.238820 | 0.328623 | 4.635722e-02 | 0.116155 | 0.193493 | 0.022153 | 0.848123 | 0.689647 | 0.689647 | 0.217622 | 7.390000e+09 | 4.993621e-04 | 0.264388 | 0.386731 | 1.626953e-02 | 1.224911e-02 | 0.631125 | 9.273293e-03 | 0.148804 | 0.927109 | 0.006847 | 0.376271 | 0.005764 | 0.115927 | 0.191607 | 0.404551 | 0.176912 | 1.454759e-03 | 8.972876e-03 | 4.620000e+09 | 3.652371e-03 | 0.042903 | 3.585477e-02 | 0.401851 | 1.502031e-02 | 0.850383 | 0.540594 | 0.689051 | 0.161073 | 1.295091e-02 | 1.280557e-02 | 0.119523 | 0.360915 | 0.277429 | 1.114677e-02 | 0.942027 | 0.738560 | 0.332322 | 9.005946e-03 | 0.944811 | 0.002492 | 0.035930 | 4.525945e-04 | 4.900000e+09 | 0.594002 | 4.510000e+09 | 0.671587 | 3.722000e-01 | 0.942027 | 0.332322 | 0.117106 | 0.663062 | 0.464236 | 0.624769 | 0.317707 | 0.038375 | 0.000000 | 0.826455 | 5.269777e-03 | 0.624168 | 0.613913 | 0.842357 | 0.281449 | 0.026913 | 0.565725 | 1.0 | 0.052838 |
| max | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9.990000e+09 | 9.980000e+09 | 1.000000 | 9.900000e+08 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 3.020000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9.990000e+09 | 9.330000e+09 | 1.000000 | 1.000000 | 2.750000e+09 | 9.230000e+09 | 1.000000 | 9.940000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9.740000e+09 | 9.730000e+09 | 9.990000e+09 | 9.990000e+09 | 1.000000 | 8.810000e+09 | 1.000000 | 9.570000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 8.820000e+09 | 9.650000e+09 | 1.000000 | 1.000000 | 1.000000 | 9.910000e+09 | 1.000000 | 1.000000 | 1.000000 | 9.540000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000e+10 | 1.000000e+10 | 1.000000 | 1.000000e+10 | 1.000000 | 8.320000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9.820000e+09 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.0 | 1.000000 |
for i in bank_data.describe().columns:
# print(len(bank_data[i].unique()) )
if len(bank_data[i].unique()) < 10:
print(i,'with',len(bank_data[i].unique()),'values')
Bankrupt? with 2 values Liability-Assets Flag with 2 values Net Income Flag with 1 values
print(bank_data[' Net Income Flag'].unique())
# Drop the columns Net Income Flag which is not required
bank_data.drop([' Net Income Flag'],axis=1,inplace=True)
[1]
Net Income Flag has only one unique value which is 1 in it
pd.DataFrame(data=bank_data.isna().mean()*100,index=bank_data.columns).T
| Bankrupt? | ROA(C) before interest and depreciation before interest | ROA(A) before interest and % after tax | ROA(B) before interest and depreciation after tax | Operating Gross Margin | Realized Sales Gross Margin | Operating Profit Rate | Pre-tax net Interest Rate | After-tax net Interest Rate | Non-industry income and expenditure/revenue | Continuous interest rate (after tax) | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Net Value Per Share (A) | Net Value Per Share (C) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Operating Profit Per Share (Yuan ¥) | Per Share Net profit before tax (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Regular Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Operating profit/Paid-in capital | Net profit before tax/Paid-in capital | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Operating Funds to Liability | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Current Liabilities/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Cash Flow to Sales | Fixed Assets to Assets | Current Liability to Liability | Current Liability to Equity | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Net Income to Total Assets | Total assets to GNP price | No-credit Interval | Gross Profit to Sales | Net Income to Stockholder's Equity | Liability to Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
sns.heatmap(bank_data.isnull(), yticklabels=False, cbar=False, cmap='coolwarm')
<AxesSubplot:>
bank_data.isna().sum().sum()
0
We can see that there are no missing values or nan values in the dataset
There is no missing data
numerical_columns=[x for x in bank_data.columns if x not in [' Net Income Flag','Bankrupt?'] ]
numerical_columns #printing the numerical columns
[' ROA(C) before interest and depreciation before interest', ' ROA(A) before interest and % after tax', ' ROA(B) before interest and depreciation after tax', ' Operating Gross Margin', ' Realized Sales Gross Margin', ' Operating Profit Rate', ' Pre-tax net Interest Rate', ' After-tax net Interest Rate', ' Non-industry income and expenditure/revenue', ' Continuous interest rate (after tax)', ' Operating Expense Rate', ' Research and development expense rate', ' Cash flow rate', ' Interest-bearing debt interest rate', ' Tax rate (A)', ' Net Value Per Share (B)', ' Net Value Per Share (A)', ' Net Value Per Share (C)', ' Persistent EPS in the Last Four Seasons', ' Cash Flow Per Share', ' Revenue Per Share (Yuan ¥)', ' Operating Profit Per Share (Yuan ¥)', ' Per Share Net profit before tax (Yuan ¥)', ' Realized Sales Gross Profit Growth Rate', ' Operating Profit Growth Rate', ' After-tax Net Profit Growth Rate', ' Regular Net Profit Growth Rate', ' Continuous Net Profit Growth Rate', ' Total Asset Growth Rate', ' Net Value Growth Rate', ' Total Asset Return Growth Rate Ratio', ' Cash Reinvestment %', ' Current Ratio', ' Quick Ratio', ' Interest Expense Ratio', ' Total debt/Total net worth', ' Debt ratio %', ' Net worth/Assets', ' Long-term fund suitability ratio (A)', ' Borrowing dependency', ' Contingent liabilities/Net worth', ' Operating profit/Paid-in capital', ' Net profit before tax/Paid-in capital', ' Inventory and accounts receivable/Net value', ' Total Asset Turnover', ' Accounts Receivable Turnover', ' Average Collection Days', ' Inventory Turnover Rate (times)', ' Fixed Assets Turnover Frequency', ' Net Worth Turnover Rate (times)', ' Revenue per person', ' Operating profit per person', ' Allocation rate per person', ' Working Capital to Total Assets', ' Quick Assets/Total Assets', ' Current Assets/Total Assets', ' Cash/Total Assets', ' Quick Assets/Current Liability', ' Cash/Current Liability', ' Current Liability to Assets', ' Operating Funds to Liability', ' Inventory/Working Capital', ' Inventory/Current Liability', ' Current Liabilities/Liability', ' Working Capital/Equity', ' Current Liabilities/Equity', ' Long-term Liability to Current Assets', ' Retained Earnings to Total Assets', ' Total income/Total expense', ' Total expense/Assets', ' Current Asset Turnover Rate', ' Quick Asset Turnover Rate', ' Working capitcal Turnover Rate', ' Cash Turnover Rate', ' Cash Flow to Sales', ' Fixed Assets to Assets', ' Current Liability to Liability', ' Current Liability to Equity', ' Equity to Long-term Liability', ' Cash Flow to Total Assets', ' Cash Flow to Liability', ' CFO to Assets', ' Cash Flow to Equity', ' Current Liability to Current Assets', ' Liability-Assets Flag', ' Net Income to Total Assets', ' Total assets to GNP price', ' No-credit Interval', ' Gross Profit to Sales', " Net Income to Stockholder's Equity", ' Liability to Equity', ' Degree of Financial Leverage (DFL)', ' Interest Coverage Ratio (Interest expense to EBIT)', ' Equity to Liability']
from IPython.display import display # to display the dataframe
def display_quatile_dist(data,features):
'''
data: dataframe
features: list of features
This function is used to display the distribution of the features in the dataframe
from the list of percentile values {0.0,0.01,0.1,0.25,0.5,0.75,0.9,0.99,1}
displays distribution of the features in the dataframe through the list of percentile values
'''
a=pd.DataFrame()
r=[0,.01,.1,.25,.5,.75,.9,.99,1]
for i in features :
# display(bank_data[i].quantile(r))
a[i]=data[i].quantile(r)
display(a.style.bar())
display_quatile_dist(bank_data,numerical_columns)
| ROA(C) before interest and depreciation before interest | ROA(A) before interest and % after tax | ROA(B) before interest and depreciation after tax | Operating Gross Margin | Realized Sales Gross Margin | Operating Profit Rate | Pre-tax net Interest Rate | After-tax net Interest Rate | Non-industry income and expenditure/revenue | Continuous interest rate (after tax) | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Net Value Per Share (A) | Net Value Per Share (C) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Operating Profit Per Share (Yuan ¥) | Per Share Net profit before tax (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Regular Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Operating profit/Paid-in capital | Net profit before tax/Paid-in capital | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Operating Funds to Liability | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Current Liabilities/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Cash Flow to Sales | Fixed Assets to Assets | Current Liability to Liability | Current Liability to Equity | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Net Income to Total Assets | Total assets to GNP price | No-credit Interval | Gross Profit to Sales | Net Income to Stockholder's Equity | Liability to Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.337876 | 0.352479 | 0.371831 | 0.580794 | 0.581259 | 0.997582 | 0.795218 | 0.807315 | 0.302198 | 0.779568 | 0.000102 | 0.000000 | 0.437551 | 0.000000 | 0.000000 | 0.140514 | 0.140028 | 0.140028 | 0.163861 | 0.281974 | 0.001470 | 0.066636 | 0.122775 | 0.021671 | 0.846386 | 0.678493 | 0.678509 | 0.216612 | 0.000102 | 0.000324 | 0.261876 | 0.321131 | 0.002390 | 0.000408 | 0.624236 | 0.000493 | 0.014149 | 0.761333 | 0.004911 | 0.369637 | 0.005366 | 0.067003 | 0.127517 | 0.393922 | 0.008996 | 0.000335 | 0.000410 | 0.000102 | 0.000104 | 0.010352 | 0.002279 | 0.357285 | 0.000226 | 0.684405 | 0.040822 | 0.085777 | 0.002648 | 0.000549 | 0.000173 | 0.011704 | 0.284612 | 0.273258 | 0.000000 | 0.205660 | 0.723198 | 0.326493 | 0.000000 | 0.853982 | 0.001835 | 0.004293 | 0.000102 | 0.000102 | 0.593794 | 0.000103 | 0.671375 | 0.003751 | 0.205660 | 0.326493 | 0.110933 | 0.523566 | 0.394009 | 0.429163 | 0.295183 | 0.003790 | 0.000000 | 0.679775 | 0.000174 | 0.615426 | 0.580790 | 0.826098 | 0.275102 | 0.025322 | 0.555236 | 0.013800 |
| 0.100000 | 0.442344 | 0.491529 | 0.491782 | 0.596513 | 0.596542 | 0.998865 | 0.797171 | 0.809113 | 0.303331 | 0.781369 | 0.000121 | 0.000000 | 0.457875 | 0.000000 | 0.000000 | 0.162606 | 0.162606 | 0.162606 | 0.201097 | 0.311594 | 0.009284 | 0.088674 | 0.158003 | 0.022016 | 0.847886 | 0.688892 | 0.688895 | 0.217539 | 0.000185 | 0.000413 | 0.263353 | 0.365020 | 0.005750 | 0.002625 | 0.630170 | 0.001619 | 0.044055 | 0.816066 | 0.005071 | 0.369637 | 0.005366 | 0.088804 | 0.157649 | 0.395706 | 0.047976 | 0.000544 | 0.002646 | 0.000123 | 0.000145 | 0.016774 | 0.006524 | 0.387212 | 0.001569 | 0.748224 | 0.144206 | 0.238380 | 0.014580 | 0.003140 | 0.000793 | 0.034457 | 0.332352 | 0.276971 | 0.001048 | 0.460519 | 0.731773 | 0.327296 | 0.000000 | 0.916344 | 0.002077 | 0.009190 | 0.000119 | 0.000116 | 0.593912 | 0.000141 | 0.671546 | 0.033545 | 0.460519 | 0.327296 | 0.110933 | 0.609768 | 0.450234 | 0.530051 | 0.309412 | 0.010763 | 0.000000 | 0.769568 | 0.000479 | 0.622950 | 0.596514 | 0.837845 | 0.275939 | 0.026650 | 0.564583 | 0.019089 |
| 0.250000 | 0.476527 | 0.535543 | 0.527277 | 0.600445 | 0.600434 | 0.998969 | 0.797386 | 0.809312 | 0.303466 | 0.781567 | 0.000157 | 0.000128 | 0.461558 | 0.000203 | 0.000000 | 0.173613 | 0.173613 | 0.173676 | 0.214711 | 0.317748 | 0.015631 | 0.096083 | 0.170370 | 0.022065 | 0.847984 | 0.689270 | 0.689270 | 0.217580 | 4860000000.000000 | 0.000441 | 0.263759 | 0.374749 | 0.007555 | 0.004726 | 0.630612 | 0.003007 | 0.072891 | 0.851196 | 0.005244 | 0.370168 | 0.005366 | 0.096105 | 0.169376 | 0.397403 | 0.076462 | 0.000710 | 0.004387 | 0.000173 | 0.000233 | 0.021774 | 0.010433 | 0.392438 | 0.004121 | 0.774309 | 0.241973 | 0.352845 | 0.033543 | 0.005240 | 0.001973 | 0.053301 | 0.341023 | 0.277034 | 0.003163 | 0.626981 | 0.733612 | 0.328096 | 0.000000 | 0.931097 | 0.002236 | 0.014567 | 0.000146 | 0.000142 | 0.593934 | 0.000274 | 0.671565 | 0.085360 | 0.626981 | 0.328096 | 0.110933 | 0.633265 | 0.457116 | 0.565987 | 0.312995 | 0.018034 | 0.000000 | 0.796750 | 0.000904 | 0.623636 | 0.600443 | 0.840115 | 0.276944 | 0.026791 | 0.565158 | 0.024477 |
| 0.500000 | 0.502706 | 0.559802 | 0.552278 | 0.605997 | 0.605976 | 0.999022 | 0.797464 | 0.809375 | 0.303525 | 0.781635 | 0.000278 | 509000000.000000 | 0.465080 | 0.000321 | 0.073489 | 0.184400 | 0.184400 | 0.184400 | 0.224544 | 0.322487 | 0.027376 | 0.104226 | 0.179709 | 0.022102 | 0.848044 | 0.689439 | 0.689439 | 0.217598 | 6400000000.000000 | 0.000462 | 0.264050 | 0.380425 | 0.010587 | 0.007412 | 0.630698 | 0.005546 | 0.111407 | 0.888593 | 0.005665 | 0.372624 | 0.005366 | 0.104133 | 0.178456 | 0.400131 | 0.118441 | 0.000968 | 0.006573 | 0.000765 | 0.000593 | 0.029516 | 0.018616 | 0.395898 | 0.007844 | 0.810275 | 0.386451 | 0.514830 | 0.074887 | 0.007909 | 0.004904 | 0.082705 | 0.348597 | 0.277178 | 0.006497 | 0.806881 | 0.736013 | 0.329685 | 0.001975 | 0.937672 | 0.002336 | 0.022674 | 0.000199 | 0.000225 | 0.593963 | 1080000000.000000 | 0.671574 | 0.196881 | 0.806881 | 0.329685 | 0.112340 | 0.645366 | 0.459750 | 0.593266 | 0.314953 | 0.027597 | 0.000000 | 0.810619 | 0.002085 | 0.623879 | 0.605998 | 0.841179 | 0.278778 | 0.026808 | 0.565252 | 0.033798 |
| 0.750000 | 0.535563 | 0.589157 | 0.584105 | 0.613914 | 0.613842 | 0.999095 | 0.797579 | 0.809469 | 0.303585 | 0.781735 | 4145000000.000000 | 3450000000.000000 | 0.471004 | 0.000533 | 0.205841 | 0.199570 | 0.199570 | 0.199612 | 0.238820 | 0.328623 | 0.046357 | 0.116155 | 0.193493 | 0.022153 | 0.848123 | 0.689647 | 0.689647 | 0.217622 | 7390000000.000000 | 0.000499 | 0.264388 | 0.386731 | 0.016270 | 0.012249 | 0.631125 | 0.009273 | 0.148804 | 0.927109 | 0.006847 | 0.376271 | 0.005764 | 0.115927 | 0.191607 | 0.404551 | 0.176912 | 0.001455 | 0.008973 | 4620000000.000000 | 0.003652 | 0.042903 | 0.035855 | 0.401851 | 0.015020 | 0.850383 | 0.540594 | 0.689051 | 0.161073 | 0.012951 | 0.012806 | 0.119523 | 0.360915 | 0.277429 | 0.011147 | 0.942027 | 0.738560 | 0.332322 | 0.009006 | 0.944811 | 0.002492 | 0.035930 | 0.000453 | 4900000000.000000 | 0.594002 | 4510000000.000000 | 0.671587 | 0.372200 | 0.942027 | 0.332322 | 0.117106 | 0.663062 | 0.464236 | 0.624769 | 0.317707 | 0.038375 | 0.000000 | 0.826455 | 0.005270 | 0.624168 | 0.613913 | 0.842357 | 0.281449 | 0.026913 | 0.565725 | 0.052838 |
| 0.900000 | 0.573948 | 0.626428 | 0.620590 | 0.623153 | 0.623088 | 0.999192 | 0.797713 | 0.809588 | 0.303720 | 0.781856 | 7980000000.000000 | 6160000000.000000 | 0.480048 | 0.000785 | 0.269967 | 0.224828 | 0.224904 | 0.224938 | 0.258788 | 0.337023 | 0.076958 | 0.133947 | 0.213452 | 0.022266 | 0.848285 | 0.690014 | 0.690009 | 0.217663 | 8390000000.000000 | 0.000583 | 0.264969 | 0.394263 | 0.027156 | 0.021649 | 0.632298 | 0.014992 | 0.183934 | 0.955945 | 0.010276 | 0.380774 | 0.006660 | 0.133750 | 0.210627 | 0.411261 | 0.254873 | 0.002480 | 0.011756 | 7990000000.000000 | 5600000000.000000 | 0.065000 | 0.079132 | 0.414138 | 0.031668 | 0.892001 | 0.684313 | 0.827327 | 0.299586 | 0.022673 | 0.030686 | 0.159749 | 0.381650 | 0.277953 | 0.018629 | 0.990066 | 0.740729 | 0.336635 | 0.020790 | 0.952976 | 0.002708 | 0.054500 | 7081999999.999998 | 8230000000.000000 | 0.594067 | 7381999999.999998 | 0.671613 | 0.558893 | 0.990066 | 0.336635 | 0.123092 | 0.697145 | 0.475959 | 0.658502 | 0.322774 | 0.049892 | 0.000000 | 0.844854 | 0.013603 | 0.624699 | 0.623152 | 0.843529 | 0.285573 | 0.027278 | 0.566989 | 0.086886 |
| 0.990000 | 0.664382 | 0.720662 | 0.708394 | 0.652017 | 0.651990 | 0.999403 | 0.798157 | 0.809967 | 0.304820 | 0.782268 | 9790000000.000000 | 9440000000.000000 | 0.516604 | 758199999.999997 | 0.643561 | 0.310391 | 0.310391 | 0.310391 | 0.332121 | 0.368013 | 0.215246 | 0.200606 | 0.282760 | 0.024486 | 0.850907 | 0.697283 | 0.696784 | 0.218559 | 9758199999.999996 | 0.001199 | 0.267758 | 0.421013 | 0.074595 | 0.065860 | 0.638289 | 0.039333 | 0.238667 | 0.985851 | 0.048392 | 0.400201 | 0.010043 | 0.199340 | 0.276928 | 0.434200 | 0.522969 | 0.025452 | 0.022410 | 9730000000.000000 | 9460000000.000000 | 0.175442 | 0.277771 | 0.505090 | 0.165605 | 0.959736 | 0.866496 | 0.958149 | 0.676546 | 0.066620 | 0.215212 | 0.225941 | 0.467819 | 0.283654 | 1415599999.999977 | 1.000000 | 0.743964 | 0.354053 | 1398199999.999997 | 0.970548 | 0.003721 | 0.128899 | 9788199999.999996 | 9830000000.000000 | 0.594542 | 9620000000.000000 | 0.671775 | 0.790928 | 1.000000 | 0.354053 | 0.141850 | 0.806048 | 0.552166 | 0.735353 | 0.338218 | 0.113040 | 0.000000 | 0.888286 | 0.166552 | 0.633377 | 0.652014 | 0.846425 | 0.300099 | 0.036506 | 0.572975 | 0.233943 |
| 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9990000000.000000 | 9980000000.000000 | 1.000000 | 990000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 3020000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9990000000.000000 | 9330000000.000000 | 1.000000 | 1.000000 | 2750000000.000000 | 9230000000.000000 | 1.000000 | 9940000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9740000000.000000 | 9730000000.000000 | 9990000000.000000 | 9990000000.000000 | 1.000000 | 8810000000.000000 | 1.000000 | 9570000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 8820000000.000000 | 9650000000.000000 | 1.000000 | 1.000000 | 1.000000 | 9910000000.000000 | 1.000000 | 1.000000 | 1.000000 | 9540000000.000000 | 1.000000 | 1.000000 | 1.000000 | 10000000000.000000 | 10000000000.000000 | 1.000000 | 10000000000.000000 | 1.000000 | 8320000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9820000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
are having high Magnitude compared to other features at .99 percentile
# Setting the size of the plot
plt.figure(figsize=(12,12),dpi=200)
sns.heatmap(bank_data.corr(),) # heatmap of the correlation between the features
<AxesSubplot:>
def correlation(dataset, threshold):
'''
dataset: dataframe
threshold: threshold value for the correlation
Type:dataset: dataframe
Type:threshold: float
This function removes the features with correlation greater than the threshold value
and it alters the dataframe to remove the features with correlation greater than the threshold value
'''
col_corr = set() # Set of all the names of deleted columns
corr_matrix = dataset.corr() # Correlation matrix
# Iterating through the correlation matrix
for i in range(len(corr_matrix.columns)):
for j in range(i):
if (corr_matrix.iloc[i, j] >= threshold) and (corr_matrix.columns[j] not in col_corr):
colname = corr_matrix.columns[i] # getting the name of column
col_corr.add(colname)
if colname in dataset.columns:
del dataset[colname] # deleting the column from the dataset
# Setting a copy of the dataset
bank_data_stage_1=bank_data.copy()
correlation(bank_data_stage_1,.85)
high_corr_cols=set(bank_data.columns)-set(bank_data_stage_1.columns)
plt.figure(figsize=(12,12),dpi=200)
sns.heatmap(bank_data[high_corr_cols].corr(),annot=True)
<AxesSubplot:>
display(bank_data_stage_1.head()) # printing the first 5 rows of the dataset
print('Printing shape of the dataset after removing the features with high correlation to check the features left')
bank_data_stage_1.shape # printing the shape of the dataset
| Bankrupt? | ROA(C) before interest and depreciation before interest | Operating Gross Margin | Operating Profit Rate | Non-industry income and expenditure/revenue | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Fixed Assets to Assets | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Total assets to GNP price | No-credit Interval | Net Income to Stockholder's Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0.370594 | 0.601457 | 0.998969 | 0.302646 | 1.256969e-04 | 0.0 | 0.458143 | 0.000725 | 0.0 | 0.147950 | 0.169141 | 0.311664 | 0.017560 | 0.022102 | 0.848195 | 0.688979 | 0.217535 | 4.980000e+09 | 0.000327 | 0.263100 | 0.363725 | 0.002259 | 0.001208 | 0.629951 | 0.021266 | 0.207576 | 0.792424 | 0.005024 | 0.390284 | 0.006479 | 0.398036 | 0.086957 | 0.001814 | 0.003487 | 1.820926e-04 | 1.165007e-04 | 0.032903 | 0.034164 | 0.392913 | 0.037135 | 0.672775 | 0.166673 | 0.190643 | 0.004094 | 0.001997 | 1.473360e-04 | 0.147308 | 0.276920 | 0.001036 | 0.676269 | 0.721275 | 0.025592 | 0.903225 | 0.002022 | 0.064856 | 7.010000e+08 | 6.550000e+09 | 0.593831 | 4.580000e+08 | 0.424206 | 0.126549 | 0.637555 | 0.458609 | 0.520382 | 0.312905 | 0.118250 | 0 | 0.009219 | 0.622879 | 0.827890 | 0.026601 | 0.564050 | 0.016469 |
| 1 | 1 | 0.464291 | 0.610235 | 0.998946 | 0.303556 | 2.897851e-04 | 0.0 | 0.461867 | 0.000647 | 0.0 | 0.182251 | 0.208944 | 0.318137 | 0.021144 | 0.022080 | 0.848088 | 0.689693 | 0.217620 | 6.110000e+09 | 0.000443 | 0.264516 | 0.376709 | 0.006016 | 0.004039 | 0.635172 | 0.012502 | 0.171176 | 0.828824 | 0.005059 | 0.376760 | 0.005835 | 0.397725 | 0.064468 | 0.001286 | 0.004917 | 9.360000e+09 | 7.190000e+08 | 0.025484 | 0.006889 | 0.391590 | 0.012335 | 0.751111 | 0.127236 | 0.182419 | 0.014948 | 0.004136 | 1.383910e-03 | 0.056963 | 0.289642 | 0.005210 | 0.308589 | 0.731975 | 0.023947 | 0.931065 | 0.002226 | 0.025516 | 1.065198e-04 | 7.700000e+09 | 0.593916 | 2.490000e+09 | 0.468828 | 0.120916 | 0.641100 | 0.459001 | 0.567101 | 0.314163 | 0.047775 | 0 | 0.008323 | 0.623652 | 0.839969 | 0.264577 | 0.570175 | 0.020794 |
| 2 | 1 | 0.426071 | 0.601450 | 0.998857 | 0.302035 | 2.361297e-04 | 25500000.0 | 0.458521 | 0.000790 | 0.0 | 0.177911 | 0.180581 | 0.307102 | 0.005944 | 0.022760 | 0.848094 | 0.689463 | 0.217601 | 7.280000e+09 | 0.000396 | 0.264184 | 0.368913 | 0.011543 | 0.005348 | 0.629631 | 0.021248 | 0.207516 | 0.792484 | 0.005100 | 0.379093 | 0.006562 | 0.406580 | 0.014993 | 0.001495 | 0.004227 | 6.500000e+07 | 2.650000e+09 | 0.013387 | 0.028997 | 0.381968 | 0.141016 | 0.829502 | 0.340201 | 0.602806 | 0.000991 | 0.006302 | 5.340000e+09 | 0.098162 | 0.277456 | 0.013879 | 0.446027 | 0.742729 | 0.003715 | 0.909903 | 0.002060 | 0.021387 | 1.791094e-03 | 1.022676e-03 | 0.594502 | 7.610000e+08 | 0.276179 | 0.117922 | 0.642765 | 0.459254 | 0.538491 | 0.314515 | 0.025346 | 0 | 0.040003 | 0.623841 | 0.836774 | 0.026555 | 0.563706 | 0.016474 |
| 3 | 1 | 0.399844 | 0.583541 | 0.998700 | 0.303350 | 1.078888e-04 | 0.0 | 0.465705 | 0.000449 | 0.0 | 0.154187 | 0.193722 | 0.321674 | 0.014368 | 0.022046 | 0.848005 | 0.689110 | 0.217568 | 4.880000e+09 | 0.000382 | 0.263371 | 0.384077 | 0.004194 | 0.002896 | 0.630228 | 0.009572 | 0.151465 | 0.848535 | 0.005047 | 0.379743 | 0.005366 | 0.397925 | 0.089955 | 0.001966 | 0.003215 | 7.130000e+09 | 9.150000e+09 | 0.028065 | 0.015463 | 0.378497 | 0.021320 | 0.725754 | 0.161575 | 0.225815 | 0.018851 | 0.002961 | 1.010646e-03 | 0.098715 | 0.276580 | 0.003540 | 0.615848 | 0.729825 | 0.022165 | 0.906902 | 0.001831 | 0.024161 | 8.140000e+09 | 6.050000e+09 | 0.593889 | 2.030000e+09 | 0.559144 | 0.120760 | 0.579039 | 0.448518 | 0.604105 | 0.302382 | 0.067250 | 0 | 0.003252 | 0.622929 | 0.834697 | 0.026697 | 0.564663 | 0.023982 |
| 4 | 1 | 0.465022 | 0.598783 | 0.998973 | 0.303475 | 7.890000e+09 | 0.0 | 0.462746 | 0.000686 | 0.0 | 0.167502 | 0.212537 | 0.319162 | 0.029690 | 0.022096 | 0.848258 | 0.689697 | 0.217626 | 5.510000e+09 | 0.000439 | 0.265218 | 0.379690 | 0.006022 | 0.003727 | 0.636055 | 0.005150 | 0.106509 | 0.893491 | 0.005303 | 0.375025 | 0.006624 | 0.400079 | 0.175412 | 0.001449 | 0.004367 | 1.633674e-04 | 2.935211e-04 | 0.040161 | 0.058111 | 0.394371 | 0.023988 | 0.751822 | 0.260330 | 0.358380 | 0.014161 | 0.004275 | 6.804636e-04 | 0.110195 | 0.287913 | 0.004869 | 0.975007 | 0.732000 | 0.000000 | 0.913850 | 0.002224 | 0.026385 | 6.680000e+09 | 5.050000e+09 | 0.593915 | 8.240000e+08 | 0.309555 | 0.110933 | 0.622374 | 0.454411 | 0.578469 | 0.311567 | 0.047725 | 0 | 0.003878 | 0.623521 | 0.839973 | 0.024752 | 0.575617 | 0.035490 |
Printing shape of the dataset after removing the features with high correlation to check the features left
(6819, 74)
(display_quatile_dist(bank_data_stage_1,bank_data_stage_1.columns))
| Bankrupt? | ROA(C) before interest and depreciation before interest | Operating Gross Margin | Operating Profit Rate | Non-industry income and expenditure/revenue | Operating Expense Rate | Research and development expense rate | Cash flow rate | Interest-bearing debt interest rate | Tax rate (A) | Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | Cash Flow Per Share | Revenue Per Share (Yuan ¥) | Realized Sales Gross Profit Growth Rate | Operating Profit Growth Rate | After-tax Net Profit Growth Rate | Continuous Net Profit Growth Rate | Total Asset Growth Rate | Net Value Growth Rate | Total Asset Return Growth Rate Ratio | Cash Reinvestment % | Current Ratio | Quick Ratio | Interest Expense Ratio | Total debt/Total net worth | Debt ratio % | Net worth/Assets | Long-term fund suitability ratio (A) | Borrowing dependency | Contingent liabilities/Net worth | Inventory and accounts receivable/Net value | Total Asset Turnover | Accounts Receivable Turnover | Average Collection Days | Inventory Turnover Rate (times) | Fixed Assets Turnover Frequency | Net Worth Turnover Rate (times) | Revenue per person | Operating profit per person | Allocation rate per person | Working Capital to Total Assets | Quick Assets/Total Assets | Current Assets/Total Assets | Cash/Total Assets | Quick Assets/Current Liability | Cash/Current Liability | Current Liability to Assets | Inventory/Working Capital | Inventory/Current Liability | Current Liabilities/Liability | Working Capital/Equity | Long-term Liability to Current Assets | Retained Earnings to Total Assets | Total income/Total expense | Total expense/Assets | Current Asset Turnover Rate | Quick Asset Turnover Rate | Working capitcal Turnover Rate | Cash Turnover Rate | Fixed Assets to Assets | Equity to Long-term Liability | Cash Flow to Total Assets | Cash Flow to Liability | CFO to Assets | Cash Flow to Equity | Current Liability to Current Assets | Liability-Assets Flag | Total assets to GNP price | No-credit Interval | Net Income to Stockholder's Equity | Degree of Financial Leverage (DFL) | Interest Coverage Ratio (Interest expense to EBIT) | Equity to Liability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.000000 | 0.337876 | 0.580794 | 0.997582 | 0.302198 | 0.000102 | 0.000000 | 0.437551 | 0.000000 | 0.000000 | 0.140514 | 0.163861 | 0.281974 | 0.001470 | 0.021671 | 0.846386 | 0.678493 | 0.216612 | 0.000102 | 0.000324 | 0.261876 | 0.321131 | 0.002390 | 0.000408 | 0.624236 | 0.000493 | 0.014149 | 0.761333 | 0.004911 | 0.369637 | 0.005366 | 0.393922 | 0.008996 | 0.000335 | 0.000410 | 0.000102 | 0.000104 | 0.010352 | 0.002279 | 0.357285 | 0.000226 | 0.684405 | 0.040822 | 0.085777 | 0.002648 | 0.000549 | 0.000173 | 0.011704 | 0.273258 | 0.000000 | 0.205660 | 0.723198 | 0.000000 | 0.853982 | 0.001835 | 0.004293 | 0.000102 | 0.000102 | 0.593794 | 0.000103 | 0.003751 | 0.110933 | 0.523566 | 0.394009 | 0.429163 | 0.295183 | 0.003790 | 0.000000 | 0.000174 | 0.615426 | 0.826098 | 0.025322 | 0.555236 | 0.013800 |
| 0.100000 | 0.000000 | 0.442344 | 0.596513 | 0.998865 | 0.303331 | 0.000121 | 0.000000 | 0.457875 | 0.000000 | 0.000000 | 0.162606 | 0.201097 | 0.311594 | 0.009284 | 0.022016 | 0.847886 | 0.688892 | 0.217539 | 0.000185 | 0.000413 | 0.263353 | 0.365020 | 0.005750 | 0.002625 | 0.630170 | 0.001619 | 0.044055 | 0.816066 | 0.005071 | 0.369637 | 0.005366 | 0.395706 | 0.047976 | 0.000544 | 0.002646 | 0.000123 | 0.000145 | 0.016774 | 0.006524 | 0.387212 | 0.001569 | 0.748224 | 0.144206 | 0.238380 | 0.014580 | 0.003140 | 0.000793 | 0.034457 | 0.276971 | 0.001048 | 0.460519 | 0.731773 | 0.000000 | 0.916344 | 0.002077 | 0.009190 | 0.000119 | 0.000116 | 0.593912 | 0.000141 | 0.033545 | 0.110933 | 0.609768 | 0.450234 | 0.530051 | 0.309412 | 0.010763 | 0.000000 | 0.000479 | 0.622950 | 0.837845 | 0.026650 | 0.564583 | 0.019089 |
| 0.250000 | 0.000000 | 0.476527 | 0.600445 | 0.998969 | 0.303466 | 0.000157 | 0.000128 | 0.461558 | 0.000203 | 0.000000 | 0.173613 | 0.214711 | 0.317748 | 0.015631 | 0.022065 | 0.847984 | 0.689270 | 0.217580 | 4860000000.000000 | 0.000441 | 0.263759 | 0.374749 | 0.007555 | 0.004726 | 0.630612 | 0.003007 | 0.072891 | 0.851196 | 0.005244 | 0.370168 | 0.005366 | 0.397403 | 0.076462 | 0.000710 | 0.004387 | 0.000173 | 0.000233 | 0.021774 | 0.010433 | 0.392438 | 0.004121 | 0.774309 | 0.241973 | 0.352845 | 0.033543 | 0.005240 | 0.001973 | 0.053301 | 0.277034 | 0.003163 | 0.626981 | 0.733612 | 0.000000 | 0.931097 | 0.002236 | 0.014567 | 0.000146 | 0.000142 | 0.593934 | 0.000274 | 0.085360 | 0.110933 | 0.633265 | 0.457116 | 0.565987 | 0.312995 | 0.018034 | 0.000000 | 0.000904 | 0.623636 | 0.840115 | 0.026791 | 0.565158 | 0.024477 |
| 0.500000 | 0.000000 | 0.502706 | 0.605997 | 0.999022 | 0.303525 | 0.000278 | 509000000.000000 | 0.465080 | 0.000321 | 0.073489 | 0.184400 | 0.224544 | 0.322487 | 0.027376 | 0.022102 | 0.848044 | 0.689439 | 0.217598 | 6400000000.000000 | 0.000462 | 0.264050 | 0.380425 | 0.010587 | 0.007412 | 0.630698 | 0.005546 | 0.111407 | 0.888593 | 0.005665 | 0.372624 | 0.005366 | 0.400131 | 0.118441 | 0.000968 | 0.006573 | 0.000765 | 0.000593 | 0.029516 | 0.018616 | 0.395898 | 0.007844 | 0.810275 | 0.386451 | 0.514830 | 0.074887 | 0.007909 | 0.004904 | 0.082705 | 0.277178 | 0.006497 | 0.806881 | 0.736013 | 0.001975 | 0.937672 | 0.002336 | 0.022674 | 0.000199 | 0.000225 | 0.593963 | 1080000000.000000 | 0.196881 | 0.112340 | 0.645366 | 0.459750 | 0.593266 | 0.314953 | 0.027597 | 0.000000 | 0.002085 | 0.623879 | 0.841179 | 0.026808 | 0.565252 | 0.033798 |
| 0.750000 | 0.000000 | 0.535563 | 0.613914 | 0.999095 | 0.303585 | 4145000000.000000 | 3450000000.000000 | 0.471004 | 0.000533 | 0.205841 | 0.199570 | 0.238820 | 0.328623 | 0.046357 | 0.022153 | 0.848123 | 0.689647 | 0.217622 | 7390000000.000000 | 0.000499 | 0.264388 | 0.386731 | 0.016270 | 0.012249 | 0.631125 | 0.009273 | 0.148804 | 0.927109 | 0.006847 | 0.376271 | 0.005764 | 0.404551 | 0.176912 | 0.001455 | 0.008973 | 4620000000.000000 | 0.003652 | 0.042903 | 0.035855 | 0.401851 | 0.015020 | 0.850383 | 0.540594 | 0.689051 | 0.161073 | 0.012951 | 0.012806 | 0.119523 | 0.277429 | 0.011147 | 0.942027 | 0.738560 | 0.009006 | 0.944811 | 0.002492 | 0.035930 | 0.000453 | 4900000000.000000 | 0.594002 | 4510000000.000000 | 0.372200 | 0.117106 | 0.663062 | 0.464236 | 0.624769 | 0.317707 | 0.038375 | 0.000000 | 0.005270 | 0.624168 | 0.842357 | 0.026913 | 0.565725 | 0.052838 |
| 0.900000 | 0.000000 | 0.573948 | 0.623153 | 0.999192 | 0.303720 | 7980000000.000000 | 6160000000.000000 | 0.480048 | 0.000785 | 0.269967 | 0.224828 | 0.258788 | 0.337023 | 0.076958 | 0.022266 | 0.848285 | 0.690014 | 0.217663 | 8390000000.000000 | 0.000583 | 0.264969 | 0.394263 | 0.027156 | 0.021649 | 0.632298 | 0.014992 | 0.183934 | 0.955945 | 0.010276 | 0.380774 | 0.006660 | 0.411261 | 0.254873 | 0.002480 | 0.011756 | 7990000000.000000 | 5600000000.000000 | 0.065000 | 0.079132 | 0.414138 | 0.031668 | 0.892001 | 0.684313 | 0.827327 | 0.299586 | 0.022673 | 0.030686 | 0.159749 | 0.277953 | 0.018629 | 0.990066 | 0.740729 | 0.020790 | 0.952976 | 0.002708 | 0.054500 | 7081999999.999998 | 8230000000.000000 | 0.594067 | 7381999999.999998 | 0.558893 | 0.123092 | 0.697145 | 0.475959 | 0.658502 | 0.322774 | 0.049892 | 0.000000 | 0.013603 | 0.624699 | 0.843529 | 0.027278 | 0.566989 | 0.086886 |
| 0.990000 | 1.000000 | 0.664382 | 0.652017 | 0.999403 | 0.304820 | 9790000000.000000 | 9440000000.000000 | 0.516604 | 758199999.999997 | 0.643561 | 0.310391 | 0.332121 | 0.368013 | 0.215246 | 0.024486 | 0.850907 | 0.697283 | 0.218559 | 9758199999.999996 | 0.001199 | 0.267758 | 0.421013 | 0.074595 | 0.065860 | 0.638289 | 0.039333 | 0.238667 | 0.985851 | 0.048392 | 0.400201 | 0.010043 | 0.434200 | 0.522969 | 0.025452 | 0.022410 | 9730000000.000000 | 9460000000.000000 | 0.175442 | 0.277771 | 0.505090 | 0.165605 | 0.959736 | 0.866496 | 0.958149 | 0.676546 | 0.066620 | 0.215212 | 0.225941 | 0.283654 | 1415599999.999977 | 1.000000 | 0.743964 | 1398199999.999997 | 0.970548 | 0.003721 | 0.128899 | 9788199999.999996 | 9830000000.000000 | 0.594542 | 9620000000.000000 | 0.790928 | 0.141850 | 0.806048 | 0.552166 | 0.735353 | 0.338218 | 0.113040 | 0.000000 | 0.166552 | 0.633377 | 0.846425 | 0.036506 | 0.572975 | 0.233943 |
| 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9990000000.000000 | 9980000000.000000 | 1.000000 | 990000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 3020000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9990000000.000000 | 9330000000.000000 | 1.000000 | 1.000000 | 2750000000.000000 | 9230000000.000000 | 1.000000 | 9940000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9740000000.000000 | 9730000000.000000 | 9990000000.000000 | 9990000000.000000 | 1.000000 | 8810000000.000000 | 1.000000 | 9570000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 8820000000.000000 | 9650000000.000000 | 1.000000 | 1.000000 | 9910000000.000000 | 1.000000 | 1.000000 | 9540000000.000000 | 1.000000 | 1.000000 | 1.000000 | 10000000000.000000 | 10000000000.000000 | 1.000000 | 10000000000.000000 | 8320000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 9820000000.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
# storing features in a list
features=[x for x in bank_data_stage_1.columns if x not in ["Bankrupt?"]]
def graph_label_title(x_label=False,y_label=False,title=False):
'''
x_label: label of the x-axis
y_label: label of the y-axis
title: title of the graph
This function sets the xlable ylable and title of the graph
'''
if x_label:
plt.xlabel(x_label,fontsize=25)
if y_label:
plt.ylabel(y_label,fontsize=25)
if title:
plt.title(title,fontsize=25)
# Plottiong the distribution of skewness of features
skew_table=bank_data_stage_1.drop(columns=['Bankrupt?'," Liability-Assets Flag"],axis=1).skew().sort_values()
print( "\033[1m" + 'Top 5 features that skewed negatively' + "\033[0m")
display(pd.DataFrame(skew_table.head())) # printing the top 5 features that skewed negatively
print( "\033[1m" + 'Top 5 features that skewed positively' + "\033[0m")
display(pd.DataFrame(skew_table.tail())) # printing the top 5 features that skewed positively
Top 5 features that skewed negatively
| 0 | |
|---|---|
| Operating Profit Growth Rate | -71.688950 |
| Operating Profit Rate | -70.237164 |
| Net Income to Stockholder's Equity | -37.964701 |
| Working Capital/Equity | -36.203654 |
| Working capitcal Turnover Rate | -28.584611 |
Top 5 features that skewed positively
| 0 | |
|---|---|
| Contingent liabilities/Net worth | 79.670620 |
| Net Value Growth Rate | 80.291844 |
| Total income/Total expense | 82.332424 |
| Current Ratio | 82.577237 |
| Fixed Assets to Assets | 82.577237 |
# Plotting the distribution of skewness of features
plt.figure(figsize=(5,3),dpi=200) # setting the size of the plot
sns.distplot(skew_table)
graph_label_title('skewed distributions',False,'Distributions of Skewness of features')
bank_data["Bankrupt?"].value_counts() # printing the value counts of the target variable
0 6599 1 220 Name: Bankrupt?, dtype: int64
def print_skewness_and_handle_outlier(skewness_range_1,skewness_range_2,data,skewness,percentile=.99,inplace=False):
'''
skewness_range_1: range 1 for skewness
skewness_range_2: range 2 for skewness
data: dataframe
skewness: indication that its a negative or positive skewed feature
percentile: percentile value to eliminate the outlier
inplace: indication that the dataframe is to be altered or not
TYPE:
skewness_range_1: int
skewness_range_2: int
data: dataframe
skewness: string
percentile: float
inplace: boolean
This function prints extracts features between the range of skewness and eliminates the outlier
using functions that are
1. return_and_print_outliers_ :
2.skewness range : To get the features that are between the range of skewness
Presentations of the function
BEFORE removing Outliers AND AFTER removing Outliers
1. It displays first 5 rows of the dataframe with features between the skewness range
2. Plots the box-plot of the features between the skewness range
3. Display the percentile distrubution
4. Histogram to check the distribution of the features or the how much the skewness is reduced by removing the outliers
'''
data_=data.copy()
skew_cols=skewness_Range(skewness_range_1,skewness_range_2)
print("*"*4+f'Displaying first 5 rows of features that are having it skewness from {skewness_range_1} to {skewness_range_2}'+"*"*4)
display(data[skew_cols].head())
print("*"*4+' Box plot before removal of outliers'+"*"*4)
plt.figure(figsize=(15,3))
ax = sns.boxplot(data=data_[skew_cols], orient="h", palette="Set2")
plt.show()
print('\n'+"*"*4+f' Displaying percentile of columns that are skewed from {skewness_range_1} to {skewness_range_2} '+"*"*4)
print('\n'+'*'*55+"BEFORE"+'*'*55)
display_quatile_dist(data_,skew_cols)
plt.show()
print("\nSKEWNESS")
print('\n'+'*'*55+"BEFORE"+'*'*55)
data_[skew_cols].hist(bins=50,figsize=(12,8))
plt.show()
outlier_indices=return_and_print_outliers_(data,skew_cols,percentile,skewness,inplace)
data.drop(outlier_indices[0],inplace=inplace)
return skew_cols
def return_and_print_outliers_(data,cols,percentile,skewness,inplace):
'''
data: dataframe
cols: list of columns
percentile: percentile value to eliminate the outlier
skewness: indication that its a negative or positive skewed feature
inplace: indication that the dataframe is to be altered or not
TYPE :
data: dataframe
cols: list
percentile: float
skewness: string
inplace: boolean
1.This function identifies data above supplied percentile and impute
2.displays the graphs like histogram boxplot and percentile distrubution chart after removing outliers
This function returns the indices of the majority and minority class that were imputed due the reason of outlier
'''
print(f'\n\n {"*"*10}Outliers beyond {percentile} percentile {"*"*10}')
# set used to collected majority class outlier indices
outliers_indcies=set()
# set used to collected minority class outlier indices
outliers_indcies_minority=set()
# dataframe to store the outliers from majority class
a=pd.DataFrame()
# dataframe to store the outliers from minority class
b=pd.DataFrame()
# Iterating over the columns supplied
for i in cols:
# condition for skewness is + i.e positive eg: we need to supply percentile like .99 ..95 .90 percentile
if skewness=="+":
# storing data that are above percentile
outlier_=data[(data[i]> percentile)]
# storing the outliers belonging to majority class
outlier=outlier_[(outlier_["Bankrupt?"]!=1)]
# storing the outliers belonging to minority class
outlier_minority=outlier_[(outlier_["Bankrupt?"]!=0)]
# condition to check if there are outliers captured and inplace value [Bool] to alter the origiial dataframe
if data.loc[(data[i]> percentile) & (data["Bankrupt?"]!=0),[i]].shape[0] > 0 and inplace:
# imputing overall median for minority class outliers
data.loc[(data[i]> percentile) & (data["Bankrupt?"]!=0),[i]]==data[i].median()
# Condition for skewness is - i.e negative eg: we need to supply percentile like .01 ..05 .10 percentile
elif skewness=='-':
# storing data that are below percentile
outlier_=data[(data[i] < percentile)]
# storing the outliers belonging to majority class
outlier=outlier_[(outlier_["Bankrupt?"]!=1)]
# storing the outliers belonging to minority class
outlier_minority=outlier_[(outlier_["Bankrupt?"]!=0)]
# condition to check if there are outliers captured and inplace value [Bool] to alter the origiial dataframe
if data.loc[(data[i] < percentile) & (data["Bankrupt?"]!=0),[i]].shape[0] > 0 and inplace:
# imputing overall median for minority class outliers
data.loc[(data[i] < percentile) & (data["Bankrupt?"]!=0),[i]]=data[i].median()
# Storing indices of the outliers captured for majority class
outliers_indcies.update(outlier.index)
# Storing indices of the outliers captured for minority class
outliers_indcies_minority.update(outlier_minority.index)
# Concatenating the outliers captured for majority class
a=pd.concat([a,outlier])
# Concatenating the outliers captured for minority class
b=pd.concat([b,outlier_minority])
# Displaying the outliers captured for majority class and minority class
if a[cols].shape[1]!=0:
modified_data=pd.concat([a,b]).loc[list(outliers_indcies)+list(outliers_indcies_minority),cols+["Bankrupt?"]]
try:
display(modified_data.reset_index(drop=True).style.bar())
# pass
except:
display(modified_data.reset_index())
print(len(outliers_indcies_minority),'minority class')
# Presentation Part
print(f'{b.shape[0]} rows of specific column are imputed to median value ')
print(f'{a.shape[0]} rows are removed ')
print("\nSKEWNESS")
print('\n'+'*'*55+"AFTER"+'*'*55)
data.drop(list(outliers_indcies)+list(outliers_indcies_minority))[cols].hist(bins=50,figsize=(12,8))
plt.show()
print("*"*4+' Box plot after removal of outliers'+"*"*4)
plt.figure(figsize=(15,3))
sns.boxplot(data=data[cols].drop(list(outliers_indcies)+list(outliers_indcies_minority)), orient="h", palette="Set2")
plt.show()
print("\nPERCENTILE")
print('*'*55+"AFTER"+'*'*55)
display_quatile_dist(data[cols].drop(list(outliers_indcies)+list(outliers_indcies_minority)),cols)
plt.show()
return [outliers_indcies,outliers_indcies_minority]
def skewness_Range(range1,range2):
'''
range1: range1 of the skew
range2: range2 of the skew
TYPE:
range1: int or float
range2: int or float
This function returns the feature that having its skewnes between the skew range
'''
cols_in_range=[]
for ind in skew_table.index:
if range2 > skew_table[ind] >= range1 :
cols_in_range.append(ind)
return cols_in_range
These features contain very uneven values at the end or the beginning These uneven values are noise that effect the model performance and are stubborn to any kind of transformation
All these features starts its distribution with 0 as its 0th percentile
# 70 amd 85 is the range of the skewness it selects the features between that range
# '+' indicates that we have checking positively skewed features
# .90 is the percentile threshold supplied
print_skewness_and_handle_outlier(70,85,bank_data_stage_1,'+',.90,True)
****Displaying first 5 rows of features that are having it skewness from 70 to 85****
| Realized Sales Gross Profit Growth Rate | Contingent liabilities/Net worth | Net Value Growth Rate | Total income/Total expense | Current Ratio | Fixed Assets to Assets | |
|---|---|---|---|---|---|---|
| 0 | 0.022102 | 0.006479 | 0.000327 | 0.002022 | 0.002259 | 0.424206 |
| 1 | 0.022080 | 0.005835 | 0.000443 | 0.002226 | 0.006016 | 0.468828 |
| 2 | 0.022760 | 0.006562 | 0.000396 | 0.002060 | 0.011543 | 0.276179 |
| 3 | 0.022046 | 0.005366 | 0.000382 | 0.001831 | 0.004194 | 0.559144 |
| 4 | 0.022096 | 0.006624 | 0.000439 | 0.002224 | 0.006022 | 0.309555 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 70 to 85 **** *******************************************************BEFORE*******************************************************
| Realized Sales Gross Profit Growth Rate | Contingent liabilities/Net worth | Net Value Growth Rate | Total income/Total expense | Current Ratio | Fixed Assets to Assets | |
|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.021671 | 0.005366 | 0.000324 | 0.001835 | 0.002390 | 0.003751 |
| 0.100000 | 0.022016 | 0.005366 | 0.000413 | 0.002077 | 0.005750 | 0.033545 |
| 0.250000 | 0.022065 | 0.005366 | 0.000441 | 0.002236 | 0.007555 | 0.085360 |
| 0.500000 | 0.022102 | 0.005366 | 0.000462 | 0.002336 | 0.010587 | 0.196881 |
| 0.750000 | 0.022153 | 0.005764 | 0.000499 | 0.002492 | 0.016270 | 0.372200 |
| 0.900000 | 0.022266 | 0.006660 | 0.000583 | 0.002708 | 0.027156 | 0.558893 |
| 0.990000 | 0.024486 | 0.010043 | 0.001199 | 0.003721 | 0.074595 | 0.790928 |
| 1.000000 | 1.000000 | 1.000000 | 9330000000.000000 | 1.000000 | 2750000000.000000 | 8320000000.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.9 percentile **********
| Realized Sales Gross Profit Growth Rate | Contingent liabilities/Net worth | Net Value Growth Rate | Total income/Total expense | Current Ratio | Fixed Assets to Assets | Bankrupt? | |
|---|---|---|---|---|---|---|---|
| 0 | 0.022121 | 0.005366 | 0.000445 | 0.002584 | 0.058422 | 0.983363 | 0 |
| 1 | 0.022092 | 0.005366 | 0.000784 | 0.001958 | 0.003541 | 0.957850 | 0 |
| 2 | 0.022124 | 0.010482 | 0.000474 | 0.005955 | 2750000000.000000 | 0.057384 | 0 |
| 3 | 0.022131 | 0.005366 | 0.000445 | 0.002232 | 0.000555 | 0.914384 | 0 |
| 4 | 0.022061 | 0.005366 | 0.000464 | 0.002236 | 0.001725 | 0.968198 | 0 |
| 5 | 0.021967 | 0.005366 | 0.000400 | 0.001869 | 0.000193 | 0.918705 | 0 |
| 6 | 0.022216 | 0.005366 | 1350000000.000000 | 0.001892 | 0.000000 | 0.882981 | 0 |
| 7 | 1.000000 | 0.005366 | 0.001381 | 0.002231 | 0.009978 | 0.032239 | 0 |
| 8 | 0.022096 | 0.005366 | 0.000444 | 0.002500 | 0.055196 | 0.983645 | 0 |
| 9 | 0.022155 | 0.005366 | 0.000534 | 0.002266 | 0.000566 | 0.901367 | 0 |
| 10 | 0.022052 | 0.006517 | 0.000454 | 1.000000 | 0.000604 | 0.002138 | 0 |
| 11 | 0.021601 | 0.006179 | 0.000520 | 0.000000 | 1.000000 | 0.000000 | 0 |
| 12 | 0.021992 | 0.005366 | 0.000443 | 0.002184 | 0.003734 | 0.998725 | 0 |
| 13 | 0.022073 | 0.005366 | 0.000444 | 0.002444 | 0.031061 | 0.941900 | 0 |
| 14 | 0.022092 | 0.005848 | 1.000000 | 0.002157 | 0.007650 | 0.168345 | 0 |
| 15 | 0.021528 | 1.000000 | 0.000204 | 0.001647 | 0.001086 | 0.620196 | 1 |
| 16 | 0.022107 | 0.005366 | 0.000437 | 0.002121 | 0.008113 | 8320000000.000000 | 1 |
| 17 | 0.021732 | 0.004445 | 9330000000.000000 | 0.001928 | 0.001261 | 0.652032 | 1 |
| 18 | 0.022193 | 0.005366 | 0.000331 | 0.001891 | 0.002554 | 1.000000 | 1 |
4 minority class 4 rows of specific column are imputed to median value 15 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Realized Sales Gross Profit Growth Rate | Contingent liabilities/Net worth | Net Value Growth Rate | Total income/Total expense | Current Ratio | Fixed Assets to Assets | |
|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000772 | 0.000163 | 0.000000 |
| 0.010000 | 0.021679 | 0.005366 | 0.000326 | 0.001837 | 0.002471 | 0.003918 |
| 0.100000 | 0.022016 | 0.005366 | 0.000413 | 0.002079 | 0.005771 | 0.033582 |
| 0.250000 | 0.022065 | 0.005366 | 0.000441 | 0.002236 | 0.007565 | 0.085271 |
| 0.500000 | 0.022102 | 0.005366 | 0.000462 | 0.002336 | 0.010598 | 0.196588 |
| 0.750000 | 0.022153 | 0.005764 | 0.000499 | 0.002492 | 0.016269 | 0.370672 |
| 0.900000 | 0.022267 | 0.006660 | 0.000582 | 0.002708 | 0.027070 | 0.555686 |
| 0.990000 | 0.024480 | 0.009963 | 0.001146 | 0.003703 | 0.073873 | 0.778641 |
| 1.000000 | 0.101643 | 0.073164 | 0.138678 | 0.021153 | 0.712630 | 0.899223 |
[' Realized Sales Gross Profit Growth Rate', ' Contingent liabilities/Net worth', ' Net Value Growth Rate', ' Total income/Total expense', ' Current Ratio', ' Fixed Assets to Assets']
print_skewness_and_handle_outlier(60,70,bank_data_stage_1,'+',.8,True)
****Displaying first 5 rows of features that are having it skewness from 60 to 70****
| Total Asset Return Growth Rate Ratio | Continuous Net Profit Growth Rate | |
|---|---|---|
| 0 | 0.263100 | 0.217535 |
| 1 | 0.264516 | 0.217620 |
| 2 | 0.264184 | 0.217601 |
| 3 | 0.263371 | 0.217568 |
| 4 | 0.265218 | 0.217626 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 60 to 70 **** *******************************************************BEFORE*******************************************************
| Total Asset Return Growth Rate Ratio | Continuous Net Profit Growth Rate | |
|---|---|---|
| 0.000000 | 0.251620 | 0.000000 |
| 0.010000 | 0.261903 | 0.216645 |
| 0.100000 | 0.263354 | 0.217539 |
| 0.250000 | 0.263760 | 0.217580 |
| 0.500000 | 0.264050 | 0.217598 |
| 0.750000 | 0.264389 | 0.217622 |
| 0.900000 | 0.264969 | 0.217663 |
| 0.990000 | 0.267758 | 0.218561 |
| 1.000000 | 1.000000 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.8 percentile **********
| Total Asset Return Growth Rate Ratio | Continuous Net Profit Growth Rate | Bankrupt? | |
|---|---|---|---|
| 0 | 1.000000 | 0.218381 | 0 |
| 1 | 0.264583 | 1.000000 | 0 |
0 minority class 0 rows of specific column are imputed to median value 2 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Total Asset Return Growth Rate Ratio | Continuous Net Profit Growth Rate | |
|---|---|---|
| 0.000000 | 0.251620 | 0.000000 |
| 0.010000 | 0.261903 | 0.216645 |
| 0.100000 | 0.263353 | 0.217539 |
| 0.250000 | 0.263760 | 0.217580 |
| 0.500000 | 0.264050 | 0.217598 |
| 0.750000 | 0.264388 | 0.217622 |
| 0.900000 | 0.264969 | 0.217663 |
| 0.990000 | 0.267758 | 0.218548 |
| 1.000000 | 0.358629 | 0.243456 |
[' Total Asset Return Growth Rate Ratio', ' Continuous Net Profit Growth Rate']
print_skewness_and_handle_outlier(45,61,bank_data_stage_1,'+',.7,True)
****Displaying first 5 rows of features that are having it skewness from 45 to 61****
| Inventory/Working Capital | Degree of Financial Leverage (DFL) | Total debt/Total net worth | Quick Assets/Current Liability | Revenue per person | |
|---|---|---|---|---|---|
| 0 | 0.276920 | 0.026601 | 0.021266 | 0.001997 | 0.034164 |
| 1 | 0.289642 | 0.264577 | 0.012502 | 0.004136 | 0.006889 |
| 2 | 0.277456 | 0.026555 | 0.021248 | 0.006302 | 0.028997 |
| 3 | 0.276580 | 0.026697 | 0.009572 | 0.002961 | 0.015463 |
| 4 | 0.287913 | 0.024752 | 0.005150 | 0.004275 | 0.058111 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 45 to 61 **** *******************************************************BEFORE*******************************************************
| Inventory/Working Capital | Degree of Financial Leverage (DFL) | Total debt/Total net worth | Quick Assets/Current Liability | Revenue per person | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.273256 | 0.025320 | 0.000492 | 0.000562 | 0.002303 |
| 0.100000 | 0.276972 | 0.026650 | 0.001625 | 0.003151 | 0.006560 |
| 0.250000 | 0.277035 | 0.026791 | 0.003008 | 0.005250 | 0.010447 |
| 0.500000 | 0.277178 | 0.026808 | 0.005543 | 0.007912 | 0.018630 |
| 0.750000 | 0.277429 | 0.026913 | 0.009254 | 0.012937 | 0.035860 |
| 0.900000 | 0.277956 | 0.027279 | 0.014962 | 0.022575 | 0.079109 |
| 0.990000 | 0.283658 | 0.036543 | 0.039028 | 0.066084 | 0.268260 |
| 1.000000 | 1.000000 | 1.000000 | 9940000000.000000 | 8820000000.000000 | 7050000000.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.7 percentile **********
| Inventory/Working Capital | Degree of Financial Leverage (DFL) | Total debt/Total net worth | Quick Assets/Current Liability | Revenue per person | Bankrupt? | |
|---|---|---|---|---|---|---|
| 0 | 0.277287 | 0.026790 | 821000000.000000 | 0.222624 | 0.013431 | 0 |
| 1 | 0.277230 | 1.000000 | 0.004319 | 0.007689 | 0.025769 | 0 |
| 2 | 0.277295 | 0.026790 | 5930000000.000000 | 0.100846 | 0.066554 | 0 |
| 3 | 0.276990 | 0.026791 | 6470000000.000000 | 0.148478 | 0.049584 | 0 |
| 4 | 0.277251 | 0.026772 | 1190000000.000000 | 0.266294 | 0.000632 | 0 |
| 5 | 0.277719 | 0.026802 | 0.013172 | 0.005739 | 1.000000 | 0 |
| 6 | 0.277005 | 0.026791 | 1820000000.000000 | 0.251210 | 0.002858 | 0 |
| 7 | 0.278049 | 0.026793 | 0.008267 | 8140000000.000000 | 0.158101 | 0 |
| 8 | 0.277302 | 0.026875 | 0.011483 | 0.006766 | 0.991117 | 0 |
| 9 | 0.277289 | 0.026793 | 9940000000.000000 | 0.198357 | 0.074189 | 0 |
| 10 | 0.277333 | 0.026851 | 0.007450 | 0.006928 | 0.840788 | 0 |
| 11 | 1.000000 | 0.026968 | 0.002708 | 0.002510 | 0.006422 | 0 |
| 12 | 0.279411 | 0.027092 | 0.018772 | 8820000000.000000 | 0.107607 | 0 |
| 13 | 0.276975 | 0.026620 | 1.000000 | 0.001226 | 0.093019 | 1 |
| 14 | 0.276392 | 0.026351 | 0.044451 | 0.000253 | 7050000000.000000 | 1 |
| 15 | 0.276975 | 0.026764 | 3470000000.000000 | 0.107927 | 0.036050 | 1 |
3 minority class 3 rows of specific column are imputed to median value 13 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Inventory/Working Capital | Degree of Financial Leverage (DFL) | Total debt/Total net worth | Quick Assets/Current Liability | Revenue per person | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.273252 | 0.025317 | 0.000491 | 0.000563 | 0.002336 |
| 0.100000 | 0.276972 | 0.026650 | 0.001619 | 0.003153 | 0.006567 |
| 0.250000 | 0.277035 | 0.026791 | 0.003006 | 0.005250 | 0.010446 |
| 0.500000 | 0.277178 | 0.026808 | 0.005527 | 0.007909 | 0.018610 |
| 0.750000 | 0.277429 | 0.026913 | 0.009214 | 0.012915 | 0.035766 |
| 0.900000 | 0.277954 | 0.027279 | 0.014894 | 0.022472 | 0.078202 |
| 0.990000 | 0.283639 | 0.036361 | 0.036023 | 0.063906 | 0.264291 |
| 1.000000 | 0.466868 | 0.540672 | 0.648868 | 0.325189 | 0.695880 |
[' Inventory/Working Capital', ' Degree of Financial Leverage (DFL)', ' Total debt/Total net worth', ' Quick Assets/Current Liability', ' Revenue per person']
print_skewness_and_handle_outlier(30,44,bank_data_stage_1,'+',.8,True)
****Displaying first 5 rows of features that are having it skewness from 30 to 44****
| Average Collection Days | Quick Ratio | Equity to Long-term Liability | Non-industry income and expenditure/revenue | Revenue Per Share (Yuan ¥) | |
|---|---|---|---|---|---|
| 0 | 0.003487 | 0.001208 | 0.126549 | 0.302646 | 0.017560 |
| 1 | 0.004917 | 0.004039 | 0.120916 | 0.303556 | 0.021144 |
| 2 | 0.004227 | 0.005348 | 0.117922 | 0.302035 | 0.005944 |
| 3 | 0.003215 | 0.002896 | 0.120760 | 0.303350 | 0.014368 |
| 4 | 0.004367 | 0.003727 | 0.110933 | 0.303475 | 0.029690 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 30 to 44 **** *******************************************************BEFORE*******************************************************
| Average Collection Days | Quick Ratio | Equity to Long-term Liability | Non-industry income and expenditure/revenue | Revenue Per Share (Yuan ¥) | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.235090 | 0.000000 |
| 0.010000 | 0.000430 | 0.000432 | 0.110933 | 0.302244 | 0.001751 |
| 0.100000 | 0.002685 | 0.002642 | 0.110933 | 0.303332 | 0.009374 |
| 0.250000 | 0.004398 | 0.004733 | 0.110933 | 0.303466 | 0.015699 |
| 0.500000 | 0.006584 | 0.007427 | 0.112351 | 0.303525 | 0.027406 |
| 0.750000 | 0.008975 | 0.012221 | 0.117111 | 0.303585 | 0.046357 |
| 0.900000 | 0.011751 | 0.021549 | 0.123092 | 0.303718 | 0.076885 |
| 0.990000 | 0.022127 | 0.064123 | 0.141763 | 0.304789 | 0.209649 |
| 1.000000 | 8800000000.000000 | 9230000000.000000 | 1.000000 | 1.000000 | 3020000000.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.8 percentile **********
| Average Collection Days | Quick Ratio | Equity to Long-term Liability | Non-industry income and expenditure/revenue | Revenue Per Share (Yuan ¥) | Bankrupt? | |
|---|---|---|---|---|---|---|
| 0 | 1790000000.000000 | 0.000283 | 0.115993 | 0.303503 | 0.015049 | 0 |
| 1 | 0.000582 | 5800000000.000000 | 0.110933 | 0.303387 | 0.008848 | 0 |
| 2 | 2130000000.000000 | 0.000656 | 0.110933 | 0.303293 | 0.001981 | 0 |
| 3 | 8800000000.000000 | 0.001520 | 0.110933 | 0.303550 | 0.160655 | 0 |
| 4 | 273000000.000000 | 0.042353 | 0.110933 | 0.303593 | 0.009135 | 0 |
| 5 | 0.001020 | 0.003398 | 0.110933 | 0.303526 | 0.993285 | 0 |
| 6 | 0.016404 | 0.005301 | 0.114850 | 1.000000 | 0.000000 | 0 |
| 7 | 751000000.000000 | 0.019313 | 0.119212 | 0.303552 | 0.040232 | 0 |
| 8 | 0.002247 | 0.005004 | 0.110933 | 0.303526 | 1.000000 | 0 |
| 9 | 478000000.000000 | 0.004243 | 0.110933 | 0.303565 | 0.036935 | 0 |
| 10 | 0.000202 | 8920000000.000000 | 0.118868 | 0.303523 | 0.031293 | 0 |
| 11 | 7940000000.000000 | 0.000248 | 0.110933 | 0.303544 | 0.022369 | 0 |
| 12 | 0.000213 | 8480000000.000000 | 0.115753 | 0.303519 | 0.015019 | 0 |
| 13 | 0.000211 | 4800000000.000000 | 0.110933 | 0.303649 | 0.003131 | 0 |
| 14 | 1790000000.000000 | 0.001490 | 0.111470 | 0.320065 | 0.001270 | 0 |
| 15 | 0.065120 | 0.006201 | 0.111182 | 0.730252 | 1510000000.000000 | 0 |
| 16 | 0.001289 | 3490000000.000000 | 0.110933 | 0.303413 | 0.011737 | 0 |
| 17 | 3240000000.000000 | 0.000659 | 0.112014 | 0.303541 | 0.030960 | 0 |
| 18 | 0.003621 | 8170000000.000000 | 0.110933 | 0.300096 | 0.002012 | 0 |
| 19 | 598000000.000000 | 0.002935 | 0.114540 | 0.330080 | 0.001724 | 0 |
| 20 | 4350000000.000000 | 0.000244 | 0.110933 | 0.303493 | 0.027103 | 0 |
| 21 | 0.000577 | 0.322045 | 0.110933 | 0.297521 | 3020000000.000000 | 0 |
| 22 | 5810000000.000000 | 0.002072 | 0.112525 | 0.303547 | 0.038659 | 0 |
| 23 | 8370000000.000000 | 0.004860 | 0.110933 | 0.303554 | 0.040171 | 0 |
| 24 | 0.000541 | 5240000000.000000 | 0.110933 | 0.303503 | 0.036814 | 0 |
| 25 | 2480000000.000000 | 2990000000.000000 | 0.112071 | 0.303399 | 0.024154 | 0 |
| 26 | 2480000000.000000 | 2990000000.000000 | 0.112071 | 0.303399 | 0.024154 | 0 |
| 27 | 0.131772 | 0.001199 | 0.112778 | 0.446321 | 1510000000.000000 | 0 |
| 28 | 7860000000.000000 | 0.006972 | 0.111496 | 0.303511 | 0.029584 | 0 |
| 29 | 0.005974 | 0.004721 | 1.000000 | 0.303529 | 0.145106 | 0 |
| 30 | 478000000.000000 | 0.000778 | 0.120176 | 0.303529 | 0.039309 | 0 |
| 31 | 137000000.000000 | 0.000000 | 0.110933 | 0.303296 | 0.003796 | 1 |
| 32 | 0.014798 | 9230000000.000000 | 0.116022 | 0.235090 | 0.000136 | 1 |
| 33 | 0.000677 | 0.000996 | 0.862994 | 0.303528 | 0.166039 | 1 |
| 34 | 0.010649 | 0.001714 | 0.922128 | 0.302381 | 0.018740 | 1 |
4 minority class 4 rows of specific column are imputed to median value 31 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Average Collection Days | Quick Ratio | Equity to Long-term Liability | Non-industry income and expenditure/revenue | Revenue Per Share (Yuan ¥) | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000100 | 0.000000 | 0.271546 | 0.000106 |
| 0.010000 | 0.000442 | 0.000442 | 0.110933 | 0.302337 | 0.001800 |
| 0.100000 | 0.002694 | 0.002678 | 0.110933 | 0.303332 | 0.009423 |
| 0.250000 | 0.004401 | 0.004752 | 0.110933 | 0.303466 | 0.015715 |
| 0.500000 | 0.006577 | 0.007434 | 0.112361 | 0.303525 | 0.027406 |
| 0.750000 | 0.008955 | 0.012213 | 0.117117 | 0.303585 | 0.046365 |
| 0.900000 | 0.011703 | 0.021503 | 0.123095 | 0.303718 | 0.076694 |
| 0.990000 | 0.019747 | 0.061341 | 0.141209 | 0.304698 | 0.205835 |
| 1.000000 | 0.718671 | 0.272800 | 0.480440 | 0.321553 | 0.720964 |
[' Average Collection Days', ' Quick Ratio', ' Equity to Long-term Liability', ' Non-industry income and expenditure/revenue', ' Revenue Per Share (Yuan ¥)']
print_skewness_and_handle_outlier(20,30,bank_data_stage_1,'+',.9,True)
****Displaying first 5 rows of features that are having it skewness from 20 to 30****
| Borrowing dependency | Total assets to GNP price | Long-term fund suitability ratio (A) | Accounts Receivable Turnover | Allocation rate per person | |
|---|---|---|---|---|---|
| 0 | 0.390284 | 0.009219 | 0.005024 | 0.001814 | 0.037135 |
| 1 | 0.376760 | 0.008323 | 0.005059 | 0.001286 | 0.012335 |
| 2 | 0.379093 | 0.040003 | 0.005100 | 0.001495 | 0.141016 |
| 3 | 0.379743 | 0.003252 | 0.005047 | 0.001966 | 0.021320 |
| 4 | 0.375025 | 0.003878 | 0.005303 | 0.001449 | 0.023988 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 20 to 30 **** *******************************************************BEFORE*******************************************************
| Borrowing dependency | Total assets to GNP price | Long-term fund suitability ratio (A) | Accounts Receivable Turnover | Allocation rate per person | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 0.010000 | 0.369637 | 0.000174 | 0.004914 | 0.000335 | 0.000233 |
| 0.100000 | 0.369637 | 0.000479 | 0.005072 | 0.000542 | 0.001591 |
| 0.250000 | 0.370177 | 0.000898 | 0.005245 | 0.000708 | 0.004124 |
| 0.500000 | 0.372624 | 0.002070 | 0.005661 | 0.000964 | 0.007817 |
| 0.750000 | 0.376256 | 0.005235 | 0.006824 | 0.001441 | 0.014770 |
| 0.900000 | 0.380722 | 0.013585 | 0.010219 | 0.002365 | 0.030992 |
| 0.990000 | 0.399922 | 0.159248 | 0.045736 | 0.015909 | 0.157046 |
| 1.000000 | 1.000000 | 9820000000.000000 | 1.000000 | 9330000000.000000 | 9570000000.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.9 percentile **********
| Borrowing dependency | Total assets to GNP price | Long-term fund suitability ratio (A) | Accounts Receivable Turnover | Allocation rate per person | Bankrupt? | |
|---|---|---|---|---|---|---|
| 0 | 0.373670 | 0.000782 | 0.029761 | 0.000797 | 8520000000.000000 | 0 |
| 1 | 0.416696 | 0.020016 | 0.005634 | 304000000.000000 | 0.043170 | 0 |
| 2 | 0.369637 | 8780000000.000000 | 0.009171 | 0.000657 | 0.001214 | 0 |
| 3 | 0.400874 | 0.018425 | 0.005599 | 3650000000.000000 | 0.043680 | 0 |
| 4 | 0.369637 | 7420000000.000000 | 0.008207 | 0.009962 | 0.000723 | 0 |
| 5 | 0.382681 | 9820000000.000000 | 0.006637 | 0.000864 | 0.000850 | 0 |
| 6 | 0.369637 | 7790000000.000000 | 0.005471 | 0.003143 | 0.003505 | 0 |
| 7 | 0.406525 | 0.020289 | 0.005886 | 3250000000.000000 | 0.034783 | 0 |
| 8 | 0.371015 | 0.000486 | 0.047691 | 0.001057 | 9570000000.000000 | 0 |
| 9 | 0.382589 | 5260000000.000000 | 0.005207 | 0.001079 | 0.000521 | 0 |
| 10 | 0.402899 | 0.008384 | 0.004842 | 7300000000.000000 | 0.413331 | 0 |
| 11 | 0.402818 | 0.007000 | 0.004852 | 4460000000.000000 | 0.397763 | 0 |
| 12 | 0.369758 | 8880000000.000000 | 0.006392 | 9330000000.000000 | 0.001231 | 0 |
| 13 | 0.369758 | 8880000000.000000 | 0.006392 | 9330000000.000000 | 0.001231 | 0 |
| 14 | 0.369637 | 9390000000.000000 | 0.005811 | 0.001045 | 0.004742 | 0 |
| 15 | 0.403770 | 0.003720 | 0.006372 | 1420000000.000000 | 0.039502 | 0 |
| 16 | 0.383563 | 6410000000.000000 | 0.004993 | 0.000580 | 0.003697 | 0 |
| 17 | 0.371300 | 0.000666 | 0.040531 | 0.000854 | 6210000000.000000 | 0 |
| 18 | 0.369770 | 0.000331 | 0.069853 | 0.001349 | 8280000000.000000 | 0 |
| 19 | 0.369637 | 0.047880 | 0.005028 | 0.003332 | 1.000000 | 0 |
| 20 | 0.380369 | 9650000000.000000 | 0.008541 | 0.004419 | 0.001473 | 0 |
| 21 | 0.369637 | 1920000000.000000 | 0.005577 | 0.002041 | 0.001432 | 0 |
| 22 | 0.437324 | 0.023522 | 0.005659 | 2840000000.000000 | 0.033298 | 0 |
| 23 | 0.403900 | 0.019353 | 0.005695 | 3250000000.000000 | 0.039626 | 0 |
| 24 | 0.373378 | 4880000000.000000 | 0.005331 | 0.000521 | 0.007362 | 0 |
| 25 | 0.369637 | 8890000000.000000 | 0.013708 | 0.009741 | 0.000349 | 0 |
| 26 | 0.404433 | 0.019708 | 0.005648 | 3650000000.000000 | 0.045817 | 0 |
| 27 | 0.395236 | 4370000000.000000 | 0.006144 | 0.021793 | 0.000900 | 0 |
| 28 | 0.357440 | 3030000000.000000 | 0.000000 | 0.001142 | 0.001259 | 0 |
| 29 | 0.369715 | 0.003876 | 0.011633 | 994000000.000000 | 0.005058 | 0 |
| 30 | 0.373412 | 0.001765 | 0.005234 | 7910000000.000000 | 0.153342 | 0 |
| 31 | 0.392974 | 0.006550 | 0.004884 | 4870000000.000000 | 0.418038 | 0 |
| 32 | 0.372707 | 0.000745 | 0.045113 | 0.000820 | 5290000000.000000 | 0 |
| 33 | 0.371337 | 394000000.000000 | 0.005541 | 0.000970 | 0.002772 | 0 |
| 34 | 0.372278 | 0.010129 | 0.007632 | 7300000000.000000 | 0.148772 | 0 |
| 35 | 0.401300 | 0.005256 | 0.004945 | 304000000.000000 | 0.212764 | 0 |
| 36 | 0.374354 | 0.000485 | 0.060427 | 0.001128 | 8410000000.000000 | 0 |
| 37 | 0.369897 | 0.001326 | 0.145681 | 0.000623 | 7970000000.000000 | 0 |
| 38 | 0.374084 | 0.002995 | 0.254191 | 0.000359 | 23200000.000000 | 0 |
| 39 | 0.385721 | 6860000000.000000 | 0.004952 | 0.000392 | 0.003878 | 0 |
| 40 | 0.385257 | 0.018980 | 1.000000 | 0.013612 | 0.000185 | 0 |
| 41 | 0.369637 | 0.002582 | 0.608185 | 0.001114 | 8900000000.000000 | 0 |
| 42 | 0.465365 | 0.002302 | 0.005076 | 812000000.000000 | 0.051331 | 0 |
| 43 | 0.370128 | 0.001094 | 0.085521 | 0.000629 | 2190000000.000000 | 0 |
| 44 | 0.372771 | 0.013550 | 0.984855 | 0.001108 | 7910000000.000000 | 0 |
| 45 | 0.372771 | 0.013550 | 0.984855 | 0.001108 | 7910000000.000000 | 0 |
| 46 | 0.369637 | 2710000000.000000 | 0.005346 | 0.000781 | 0.003710 | 1 |
| 47 | 0.431840 | 8140000000.000000 | 0.005328 | 0.000684 | 0.001540 | 1 |
| 48 | 0.954819 | 0.003198 | 0.004826 | 0.009341 | 0.037792 | 1 |
| 49 | 0.395616 | 0.002044 | 0.923930 | 0.450399 | 3480000000.000000 | 1 |
| 50 | 0.395616 | 0.002044 | 0.923930 | 0.450399 | 3480000000.000000 | 1 |
| 51 | 1.000000 | 0.000456 | 0.004902 | 0.000594 | 0.009052 | 1 |
| 52 | 0.369637 | 9170000000.000000 | 0.006451 | 0.001057 | 0.000667 | 1 |
| 53 | 0.409987 | 0.003736 | 0.004874 | 1220000000.000000 | 0.013696 | 1 |
7 minority class 8 rows of specific column are imputed to median value 46 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Borrowing dependency | Total assets to GNP price | Long-term fund suitability ratio (A) | Accounts Receivable Turnover | Allocation rate per person | |
|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.004129 | 0.000000 | 0.000000 |
| 0.010000 | 0.369637 | 0.000174 | 0.004922 | 0.000335 | 0.000233 |
| 0.100000 | 0.369637 | 0.000479 | 0.005073 | 0.000542 | 0.001619 |
| 0.250000 | 0.370182 | 0.000895 | 0.005245 | 0.000706 | 0.004136 |
| 0.500000 | 0.372611 | 0.002063 | 0.005661 | 0.000962 | 0.007802 |
| 0.750000 | 0.376224 | 0.005192 | 0.006819 | 0.001435 | 0.014669 |
| 0.900000 | 0.380579 | 0.013198 | 0.010149 | 0.002329 | 0.030153 |
| 0.990000 | 0.397356 | 0.127666 | 0.041024 | 0.013402 | 0.128047 |
| 1.000000 | 0.734611 | 0.555479 | 0.696961 | 0.062632 | 0.799847 |
[' Borrowing dependency', ' Total assets to GNP price', ' Long-term fund suitability ratio (A)', ' Accounts Receivable Turnover', ' Allocation rate per person']
print_skewness_and_handle_outlier(13,20,bank_data_stage_1,'+',.95,True)
****Displaying first 5 rows of features that are having it skewness from 13 to 20****
| Inventory and accounts receivable/Net value | Current Liability to Current Assets | Cash/Current Liability | Cash Flow to Equity | |
|---|---|---|---|---|
| 0 | 0.398036 | 0.118250 | 1.473360e-04 | 0.312905 |
| 1 | 0.397725 | 0.047775 | 1.383910e-03 | 0.314163 |
| 2 | 0.406580 | 0.025346 | 5.340000e+09 | 0.314515 |
| 3 | 0.397925 | 0.067250 | 1.010646e-03 | 0.302382 |
| 4 | 0.400079 | 0.047725 | 6.804636e-04 | 0.311567 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 13 to 20 **** *******************************************************BEFORE*******************************************************
| Inventory and accounts receivable/Net value | Current Liability to Current Assets | Cash/Current Liability | Cash Flow to Equity | |
|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000770 | 0.000000 | 0.000000 |
| 0.010000 | 0.393995 | 0.004009 | 0.000174 | 0.295908 |
| 0.100000 | 0.395740 | 0.010881 | 0.000817 | 0.309430 |
| 0.250000 | 0.397424 | 0.018009 | 0.001987 | 0.313004 |
| 0.500000 | 0.400123 | 0.027500 | 0.004925 | 0.314956 |
| 0.750000 | 0.404469 | 0.038225 | 0.012796 | 0.317707 |
| 0.900000 | 0.410932 | 0.049704 | 0.030290 | 0.322786 |
| 0.990000 | 0.431420 | 0.109633 | 0.177336 | 0.338199 |
| 1.000000 | 0.707445 | 0.650661 | 9650000000.000000 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.95 percentile **********
| Inventory and accounts receivable/Net value | Current Liability to Current Assets | Cash/Current Liability | Cash Flow to Equity | Bankrupt? | |
|---|---|---|---|---|---|
| 0 | 0.426719 | 0.034957 | 7580000000.000000 | 0.313553 | 0 |
| 1 | 0.414449 | 0.033030 | 8550000000.000000 | 0.312888 | 0 |
| 2 | 0.394885 | 0.040896 | 9060000000.000000 | 0.310667 | 0 |
| 3 | 0.399834 | 0.043580 | 5900000000.000000 | 0.314501 | 0 |
| 4 | 0.428287 | 0.033204 | 9170000000.000000 | 0.314467 | 0 |
| 5 | 0.419155 | 0.029034 | 1300000000.000000 | 0.314287 | 0 |
| 6 | 0.407095 | 0.037160 | 4610000000.000000 | 0.311171 | 0 |
| 7 | 0.425422 | 0.033021 | 4610000000.000000 | 0.312245 | 0 |
| 8 | 0.398329 | 0.042739 | 9650000000.000000 | 0.313738 | 0 |
| 9 | 0.415081 | 0.059007 | 1840000000.000000 | 0.310570 | 0 |
| 10 | 0.397457 | 0.112154 | 6610000000.000000 | 0.314369 | 0 |
| 11 | 0.396652 | 0.075770 | 8590000000.000000 | 0.313749 | 0 |
| 12 | 0.400667 | 0.050842 | 8330000000.000000 | 0.314293 | 0 |
| 13 | 0.410790 | 0.035499 | 7510000000.000000 | 0.313670 | 0 |
| 14 | 0.423331 | 0.040878 | 8870000000.000000 | 0.313287 | 0 |
| 15 | 0.399777 | 0.055387 | 6870000000.000000 | 0.313302 | 0 |
| 16 | 0.397619 | 0.045314 | 3830000000.000000 | 0.314393 | 0 |
| 17 | 0.405171 | 0.033506 | 9410000000.000000 | 0.314091 | 0 |
| 18 | 0.400018 | 0.044585 | 8870000000.000000 | 0.308577 | 0 |
| 19 | 0.410212 | 0.041792 | 287000000.000000 | 0.314553 | 0 |
| 20 | 0.413891 | 0.030715 | 6970000000.000000 | 0.314694 | 0 |
| 21 | 0.430161 | 0.031565 | 5740000000.000000 | 0.314544 | 0 |
| 22 | 0.406061 | 0.089706 | 7540000000.000000 | 0.313538 | 0 |
| 23 | 0.406961 | 0.070746 | 0.012010 | 1.000000 | 0 |
| 24 | 0.419876 | 0.065144 | 3080000000.000000 | 0.313816 | 0 |
| 25 | 0.396269 | 0.080904 | 4950000000.000000 | 0.314529 | 0 |
| 26 | 0.406580 | 0.025346 | 5340000000.000000 | 0.314515 | 1 |
| 27 | 0.425839 | 0.042234 | 3370000000.000000 | 0.309981 | 1 |
| 28 | 0.408737 | 0.080395 | 444000000.000000 | 0.286312 | 1 |
| 29 | 0.441273 | 0.035799 | 2090000000.000000 | 0.312128 | 1 |
| 30 | 0.413586 | 0.084561 | 3110000000.000000 | 0.314222 | 1 |
| 31 | 0.396232 | 0.142964 | 6730000000.000000 | 0.314092 | 1 |
| 32 | 0.411399 | 0.055279 | 696000000.000000 | 0.314013 | 1 |
| 33 | 0.415298 | 0.055361 | 5390000000.000000 | 0.311226 | 1 |
| 34 | 0.406098 | 0.069956 | 9010000000.000000 | 0.310119 | 1 |
| 35 | 0.434070 | 0.104796 | 1140000000.000000 | 0.306167 | 1 |
| 36 | 0.415443 | 0.037514 | 3540000000.000000 | 0.313609 | 1 |
| 37 | 0.397559 | 0.084685 | 248000000.000000 | 0.306692 | 1 |
| 38 | 0.400885 | 0.023458 | 479000000.000000 | 0.313628 | 1 |
| 39 | 0.416035 | 0.021729 | 6840000000.000000 | 0.314395 | 1 |
| 40 | 0.405900 | 0.042721 | 7660000000.000000 | 0.314273 | 1 |
15 minority class 15 rows of specific column are imputed to median value 26 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Inventory and accounts receivable/Net value | Current Liability to Current Assets | Cash/Current Liability | Cash Flow to Equity | |
|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000770 | 0.000000 | 0.000000 |
| 0.010000 | 0.393995 | 0.004001 | 0.000174 | 0.296046 |
| 0.100000 | 0.395734 | 0.010811 | 0.000810 | 0.309429 |
| 0.250000 | 0.397413 | 0.017950 | 0.001976 | 0.313014 |
| 0.500000 | 0.400108 | 0.027410 | 0.004873 | 0.314972 |
| 0.750000 | 0.404428 | 0.038119 | 0.012508 | 0.317728 |
| 0.900000 | 0.410782 | 0.049263 | 0.029010 | 0.322805 |
| 0.990000 | 0.431208 | 0.108641 | 0.120200 | 0.338137 |
| 1.000000 | 0.707445 | 0.650661 | 0.738106 | 0.569231 |
[' Inventory and accounts receivable/Net value', ' Current Liability to Current Assets', ' Cash/Current Liability', ' Cash Flow to Equity']
print_skewness_and_handle_outlier(7,10,bank_data_stage_1,'+',.95,True)
****Displaying first 5 rows of features that are having it skewness from 7 to 10****
| Interest-bearing debt interest rate | Equity to Liability | Operating profit per person | Cash Flow Per Share | Net Worth Turnover Rate (times) | Total expense/Assets | |
|---|---|---|---|---|---|---|
| 0 | 0.000725 | 0.016469 | 0.392913 | 0.311664 | 0.032903 | 0.064856 |
| 1 | 0.000647 | 0.020794 | 0.391590 | 0.318137 | 0.025484 | 0.025516 |
| 2 | 0.000790 | 0.016474 | 0.381968 | 0.307102 | 0.013387 | 0.021387 |
| 3 | 0.000449 | 0.023982 | 0.378497 | 0.321674 | 0.028065 | 0.024161 |
| 4 | 0.000686 | 0.035490 | 0.394371 | 0.319162 | 0.040161 | 0.026385 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 7 to 10 **** *******************************************************BEFORE*******************************************************
| Interest-bearing debt interest rate | Equity to Liability | Operating profit per person | Cash Flow Per Share | Net Worth Turnover Rate (times) | Total expense/Assets | |
|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.003946 | 0.000000 | 0.128740 | 0.009032 | 0.000853 |
| 0.010000 | 0.000000 | 0.014116 | 0.359405 | 0.282839 | 0.010806 | 0.004505 |
| 0.100000 | 0.000000 | 0.019273 | 0.387479 | 0.311771 | 0.017097 | 0.009379 |
| 0.250000 | 0.000205 | 0.024618 | 0.392472 | 0.317818 | 0.021935 | 0.014746 |
| 0.500000 | 0.000321 | 0.033961 | 0.395920 | 0.322558 | 0.029677 | 0.022767 |
| 0.750000 | 0.000534 | 0.052997 | 0.401777 | 0.328712 | 0.042903 | 0.035914 |
| 0.900000 | 0.000786 | 0.086714 | 0.413756 | 0.337094 | 0.064548 | 0.054298 |
| 0.990000 | 750000000.000000 | 0.223302 | 0.495997 | 0.367599 | 0.171310 | 0.121626 |
| 1.000000 | 990000000.000000 | 0.798122 | 1.000000 | 0.577386 | 0.510645 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.95 percentile **********
| Interest-bearing debt interest rate | Equity to Liability | Operating profit per person | Cash Flow Per Share | Net Worth Turnover Rate (times) | Total expense/Assets | Bankrupt? | |
|---|---|---|---|---|---|---|---|
| 0 | 520000000.000000 | 0.072854 | 0.396418 | 0.325387 | 0.019032 | 0.027888 | 0 |
| 1 | 490000000.000000 | 0.031042 | 0.410563 | 0.333805 | 0.032581 | 0.012962 | 0 |
| 2 | 750000000.000000 | 0.031186 | 0.407623 | 0.325564 | 0.042258 | 0.022009 | 0 |
| 3 | 60000000.000000 | 0.021747 | 0.401076 | 0.306925 | 0.065645 | 0.021840 | 0 |
| 4 | 70000000.000000 | 0.042678 | 0.404819 | 0.324680 | 0.048871 | 0.028932 | 0 |
| 5 | 730000000.000000 | 0.038066 | 0.401269 | 0.319375 | 0.028548 | 0.018908 | 0 |
| 6 | 690000000.000000 | 0.028336 | 0.400330 | 0.338367 | 0.041290 | 0.017965 | 0 |
| 7 | 440000000.000000 | 0.089848 | 0.396181 | 0.323973 | 0.034194 | 0.008222 | 0 |
| 8 | 820000000.000000 | 0.044549 | 0.432091 | 0.335432 | 0.035323 | 0.007565 | 0 |
| 9 | 620000000.000000 | 0.024016 | 0.418749 | 0.325953 | 0.028548 | 0.008602 | 0 |
| 10 | 780000000.000000 | 0.073980 | 0.397537 | 0.328146 | 0.015323 | 0.012923 | 0 |
| 11 | 360000000.000000 | 0.071542 | 0.393987 | 0.387635 | 0.039194 | 0.009749 | 0 |
| 12 | 8000000.000000 | 0.064913 | 0.401144 | 0.330480 | 0.022419 | 0.028847 | 0 |
| 13 | 840000000.000000 | 0.106411 | 0.426585 | 0.326943 | 0.014839 | 0.005934 | 0 |
| 14 | 380000000.000000 | 0.045433 | 0.400579 | 0.334335 | 0.027258 | 0.015545 | 0 |
| 15 | 350000000.000000 | 0.027587 | 0.393716 | 0.324786 | 0.039194 | 0.022653 | 0 |
| 16 | 3000000.000000 | 0.082486 | 0.379933 | 0.309153 | 0.010645 | 0.066169 | 0 |
| 17 | 260000000.000000 | 0.037803 | 0.396463 | 0.323548 | 0.028065 | 0.026794 | 0 |
| 18 | 320000000.000000 | 0.040429 | 0.416872 | 0.329702 | 0.019032 | 0.009266 | 0 |
| 19 | 960000000.000000 | 0.027628 | 0.402467 | 0.327509 | 0.039355 | 0.016886 | 0 |
| 20 | 2000000.000000 | 0.091038 | 0.403779 | 0.334689 | 0.022581 | 0.022744 | 0 |
| 21 | 660000000.000000 | 0.073416 | 0.401574 | 0.326307 | 0.023065 | 0.037820 | 0 |
| 22 | 670000000.000000 | 0.085945 | 0.396949 | 0.326943 | 0.020645 | 0.027039 | 0 |
| 23 | 740000000.000000 | 0.037684 | 0.398555 | 0.321249 | 0.023387 | 0.035183 | 0 |
| 24 | 610000000.000000 | 0.028867 | 0.408856 | 0.342152 | 0.048226 | 0.024128 | 0 |
| 25 | 910000000.000000 | 0.029368 | 0.403473 | 0.308234 | 0.037419 | 0.038075 | 0 |
| 26 | 750000000.000000 | 0.025821 | 0.396181 | 0.323230 | 0.048871 | 0.012957 | 0 |
| 27 | 2000000.000000 | 0.026984 | 0.405192 | 0.329030 | 0.039032 | 0.012321 | 0 |
| 28 | 480000000.000000 | 0.134306 | 0.395299 | 0.322098 | 0.015806 | 0.013089 | 0 |
| 29 | 960000000.000000 | 0.041323 | 0.399482 | 0.314954 | 0.028226 | 0.004456 | 0 |
| 30 | 520000000.000000 | 0.036720 | 0.382952 | 0.314954 | 0.029355 | 0.054901 | 0 |
| 31 | 310000000.000000 | 0.105285 | 0.404796 | 0.326484 | 0.027419 | 0.026217 | 0 |
| 32 | 9000000.000000 | 0.035611 | 0.413910 | 0.342965 | 0.065000 | 0.041523 | 0 |
| 33 | 550000000.000000 | 0.020656 | 0.399403 | 0.329313 | 0.094516 | 0.018354 | 0 |
| 34 | 10000000.000000 | 0.093217 | 0.410721 | 0.329278 | 0.022419 | 0.016048 | 0 |
| 35 | 3000000.000000 | 0.054352 | 0.408335 | 0.339181 | 0.025161 | 0.015610 | 0 |
| 36 | 8000000.000000 | 0.050445 | 0.400669 | 0.328606 | 0.023226 | 0.010499 | 0 |
| 37 | 310000000.000000 | 0.063523 | 0.407092 | 0.334088 | 0.021935 | 0.032127 | 0 |
| 38 | 740000000.000000 | 0.051965 | 0.393942 | 0.325281 | 0.016290 | 0.027915 | 0 |
| 39 | 460000000.000000 | 0.062529 | 0.398872 | 0.318773 | 0.024839 | 0.023611 | 0 |
| 40 | 710000000.000000 | 0.218822 | 0.373454 | 0.318314 | 0.009032 | 0.011746 | 0 |
| 41 | 420000000.000000 | 0.030241 | 0.398238 | 0.307915 | 0.023387 | 0.013258 | 0 |
| 42 | 110000000.000000 | 0.073384 | 0.403270 | 0.351843 | 0.018871 | 0.009454 | 0 |
| 43 | 590000000.000000 | 0.091052 | 0.384320 | 0.316545 | 0.011613 | 0.029660 | 0 |
| 44 | 3000000.000000 | 0.059865 | 0.397662 | 0.342152 | 0.018871 | 0.006812 | 0 |
| 45 | 8000000.000000 | 0.261579 | 0.387406 | 0.318137 | 0.016774 | 0.015162 | 0 |
| 46 | 950000000.000000 | 0.035786 | 0.396113 | 0.341692 | 0.067097 | 0.045149 | 0 |
| 47 | 50000000.000000 | 0.054892 | 0.403869 | 0.336811 | 0.038871 | 0.024550 | 0 |
| 48 | 380000000.000000 | 0.030406 | 0.411841 | 0.320825 | 0.068065 | 0.026337 | 0 |
| 49 | 190000000.000000 | 0.034502 | 0.394010 | 0.321886 | 0.042581 | 0.030694 | 0 |
| 50 | 810000000.000000 | 0.041372 | 0.396203 | 0.328712 | 0.038871 | 0.047081 | 0 |
| 51 | 1000000.000000 | 0.108098 | 0.391873 | 0.328783 | 0.019032 | 0.009565 | 0 |
| 52 | 760000000.000000 | 0.118767 | 0.395988 | 0.329631 | 0.024032 | 0.039087 | 0 |
| 53 | 330000000.000000 | 0.050264 | 0.395943 | 0.327580 | 0.035323 | 0.036613 | 0 |
| 54 | 780000000.000000 | 0.091372 | 0.409489 | 0.352126 | 0.040645 | 0.006052 | 0 |
| 55 | 770000000.000000 | 0.053676 | 0.415651 | 0.336245 | 0.038871 | 0.029418 | 0 |
| 56 | 760000000.000000 | 0.062338 | 0.389363 | 0.321178 | 0.022581 | 0.034798 | 0 |
| 57 | 410000000.000000 | 0.052509 | 0.393795 | 0.322558 | 0.024839 | 0.023943 | 0 |
| 58 | 1000000.000000 | 0.058230 | 0.376281 | 0.310037 | 0.014677 | 0.064382 | 0 |
| 59 | 8000000.000000 | 0.055311 | 0.395480 | 0.317111 | 0.016290 | 0.015166 | 0 |
| 60 | 270000000.000000 | 0.036608 | 0.411388 | 0.325423 | 0.049032 | 0.017235 | 0 |
| 61 | 7000000.000000 | 0.048394 | 0.398340 | 0.307809 | 0.033710 | 0.027595 | 0 |
| 62 | 840000000.000000 | 0.050474 | 0.403937 | 0.327439 | 0.022903 | 0.025927 | 0 |
| 63 | 980000000.000000 | 0.061039 | 0.400534 | 0.322133 | 0.021613 | 0.011801 | 0 |
| 64 | 990000000.000000 | 0.019289 | 0.416963 | 0.349013 | 0.098387 | 0.025159 | 0 |
| 65 | 580000000.000000 | 0.056452 | 0.409466 | 0.330975 | 0.029839 | 0.023254 | 0 |
| 66 | 760000000.000000 | 0.119565 | 0.383246 | 0.305829 | 0.016774 | 0.021251 | 0 |
| 67 | 680000000.000000 | 0.047097 | 0.405837 | 0.332178 | 0.020806 | 0.005535 | 0 |
| 68 | 170000000.000000 | 0.120287 | 0.396757 | 0.323760 | 0.019516 | 0.018019 | 0 |
| 69 | 620000000.000000 | 0.079922 | 0.394880 | 0.316899 | 0.019194 | 0.016713 | 0 |
| 70 | 360000000.000000 | 0.022888 | 0.396350 | 0.322452 | 0.072742 | 0.024560 | 0 |
| 71 | 3000000.000000 | 0.024309 | 0.406232 | 0.371437 | 0.032258 | 0.011168 | 0 |
| 72 | 710000000.000000 | 0.050094 | 0.392981 | 0.323442 | 0.028548 | 0.021798 | 0 |
| 73 | 8000000.000000 | 0.096373 | 0.400172 | 0.301514 | 0.027258 | 0.031537 | 0 |
| 74 | 0.000000 | 0.019908 | 0.376303 | 0.312725 | 0.030161 | 1.000000 | 0 |
| 75 | 210000000.000000 | 0.044962 | 0.400511 | 0.323230 | 0.035000 | 0.016066 | 0 |
| 76 | 870000000.000000 | 0.054243 | 0.398284 | 0.318986 | 0.038871 | 0.009353 | 0 |
| 77 | 870000000.000000 | 0.057288 | 0.394903 | 0.328889 | 0.025323 | 0.081435 | 0 |
| 78 | 770000000.000000 | 0.040325 | 0.416069 | 0.333239 | 0.042742 | 0.028252 | 0 |
| 79 | 730000000.000000 | 0.066087 | 0.388537 | 0.316404 | 0.020323 | 0.008651 | 0 |
| 80 | 980000000.000000 | 0.052501 | 0.397797 | 0.322805 | 0.017419 | 0.009767 | 0 |
| 81 | 590000000.000000 | 0.233902 | 0.385767 | 0.305793 | 0.009194 | 0.006089 | 0 |
| 82 | 70000000.000000 | 0.090568 | 0.299552 | 0.310497 | 0.010806 | 0.002369 | 0 |
| 83 | 720000000.000000 | 0.032201 | 0.389283 | 0.325670 | 0.040323 | 0.016849 | 0 |
| 84 | 940000000.000000 | 0.126188 | 0.391432 | 0.329631 | 0.015968 | 0.036797 | 0 |
| 85 | 490000000.000000 | 0.049445 | 0.393535 | 0.327580 | 0.029677 | 0.023319 | 0 |
| 86 | 190000000.000000 | 0.048432 | 0.395559 | 0.319269 | 0.033548 | 0.056904 | 0 |
| 87 | 570000000.000000 | 0.181288 | 0.397990 | 0.322098 | 0.014355 | 0.006343 | 0 |
| 88 | 380000000.000000 | 0.021866 | 0.402987 | 0.320683 | 0.060323 | 0.011553 | 0 |
| 89 | 990000000.000000 | 0.180967 | 0.407386 | 0.331577 | 0.015645 | 0.020865 | 0 |
| 90 | 240000000.000000 | 0.025605 | 0.427794 | 0.298048 | 0.156129 | 0.020055 | 0 |
| 91 | 880000000.000000 | 0.055997 | 0.439203 | 0.322240 | 0.093710 | 0.026468 | 0 |
| 92 | 670000000.000000 | 0.081300 | 0.387056 | 0.311417 | 0.014677 | 0.038592 | 0 |
| 93 | 710000000.000000 | 0.087056 | 0.392596 | 0.324149 | 0.018710 | 0.020968 | 0 |
| 94 | 50000000.000000 | 0.032318 | 0.399573 | 0.322487 | 0.029355 | 0.020803 | 0 |
| 95 | 750000000.000000 | 0.031863 | 0.423498 | 0.324821 | 0.050968 | 0.016542 | 0 |
| 96 | 780000000.000000 | 0.031160 | 0.393286 | 0.340313 | 0.031935 | 0.013669 | 0 |
| 97 | 790000000.000000 | 0.019903 | 0.409048 | 0.304449 | 0.078548 | 0.020541 | 0 |
| 98 | 840000000.000000 | 0.062384 | 0.403507 | 0.315484 | 0.020323 | 0.055281 | 0 |
| 99 | 870000000.000000 | 0.062481 | 0.405486 | 0.321815 | 0.025000 | 0.019636 | 0 |
| 100 | 260000000.000000 | 0.028513 | 0.413435 | 0.329313 | 0.069516 | 0.017971 | 0 |
| 101 | 980000000.000000 | 0.076676 | 0.393670 | 0.321532 | 0.019032 | 0.014767 | 0 |
| 102 | 720000000.000000 | 0.058446 | 0.402298 | 0.338756 | 0.033548 | 0.023475 | 0 |
| 103 | 370000000.000000 | 0.094896 | 0.396248 | 0.325564 | 0.020645 | 0.017479 | 0 |
| 104 | 850000000.000000 | 0.055228 | 0.394835 | 0.324538 | 0.032742 | 0.055063 | 0 |
| 105 | 940000000.000000 | 0.062674 | 0.397198 | 0.312549 | 0.028710 | 0.016387 | 0 |
| 106 | 610000000.000000 | 0.091046 | 0.400941 | 0.337448 | 0.016452 | 0.015346 | 0 |
| 107 | 180000000.000000 | 0.078412 | 0.402331 | 0.326590 | 0.023387 | 0.024289 | 0 |
| 108 | 820000000.000000 | 0.038897 | 0.394349 | 0.333840 | 0.019355 | 0.010522 | 0 |
| 109 | 840000000.000000 | 0.030713 | 0.393297 | 0.316333 | 0.023710 | 0.008245 | 0 |
| 110 | 820000000.000000 | 0.063111 | 0.399245 | 0.323831 | 0.024516 | 0.009728 | 0 |
| 111 | 410000000.000000 | 0.047637 | 0.431684 | 0.315413 | 0.054355 | 0.027380 | 0 |
| 112 | 570000000.000000 | 0.064142 | 0.393049 | 0.316828 | 0.029839 | 0.027636 | 0 |
| 113 | 740000000.000000 | 0.032882 | 0.396135 | 0.314211 | 0.028387 | 0.021803 | 0 |
| 114 | 970000000.000000 | 0.057659 | 0.399539 | 0.320754 | 0.021935 | 0.005368 | 0 |
| 115 | 670000000.000000 | 0.022662 | 0.394191 | 0.369385 | 0.020323 | 0.001218 | 0 |
| 116 | 980000000.000000 | 0.031699 | 0.392981 | 0.313928 | 0.030000 | 0.023113 | 0 |
| 117 | 690000000.000000 | 0.025150 | 0.386570 | 0.316722 | 0.040484 | 0.018628 | 0 |
| 118 | 40000000.000000 | 0.022666 | 0.395977 | 0.329773 | 0.031129 | 0.009211 | 0 |
| 119 | 690000000.000000 | 0.158201 | 0.391929 | 0.320896 | 0.014677 | 0.009727 | 0 |
| 120 | 8000000.000000 | 0.065654 | 0.398612 | 0.345936 | 0.025968 | 0.026999 | 0 |
| 121 | 670000000.000000 | 0.034523 | 0.397198 | 0.317394 | 0.038548 | 0.068591 | 0 |
| 122 | 5000000.000000 | 0.088727 | 0.385790 | 0.317889 | 0.019516 | 0.089896 | 0 |
| 123 | 840000000.000000 | 0.058825 | 0.403767 | 0.325493 | 0.035484 | 0.083250 | 0 |
| 124 | 290000000.000000 | 0.110306 | 0.389758 | 0.319127 | 0.016613 | 0.060220 | 0 |
| 125 | 210000000.000000 | 0.041749 | 0.398725 | 0.329349 | 0.041452 | 0.035156 | 0 |
| 126 | 290000000.000000 | 0.053989 | 0.395774 | 0.331577 | 0.022419 | 0.059604 | 0 |
| 127 | 920000000.000000 | 0.033676 | 0.422288 | 0.325352 | 0.043387 | 0.023020 | 0 |
| 128 | 940000000.000000 | 0.042514 | 0.397933 | 0.322027 | 0.029032 | 0.017046 | 0 |
| 129 | 960000000.000000 | 0.024650 | 0.455202 | 0.328429 | 0.101129 | 0.008791 | 0 |
| 130 | 270000000.000000 | 0.050986 | 0.397323 | 0.338261 | 0.017742 | 0.004631 | 0 |
| 131 | 950000000.000000 | 0.082778 | 0.397515 | 0.326271 | 0.034677 | 0.044384 | 0 |
| 132 | 50000000.000000 | 0.231533 | 0.384862 | 0.315979 | 0.014516 | 0.090030 | 0 |
| 133 | 360000000.000000 | 0.036270 | 0.410597 | 0.312230 | 0.034516 | 0.016273 | 0 |
| 134 | 180000000.000000 | 0.083222 | 0.425013 | 0.343779 | 0.040161 | 0.034074 | 0 |
| 135 | 7000000.000000 | 0.095195 | 0.398962 | 0.315166 | 0.026935 | 0.029453 | 0 |
| 136 | 990000000.000000 | 0.028076 | 0.389871 | 0.304414 | 0.021129 | 0.012035 | 0 |
| 137 | 860000000.000000 | 0.028247 | 0.389091 | 0.322310 | 0.019032 | 0.001990 | 0 |
| 138 | 870000000.000000 | 0.028434 | 0.393263 | 0.320648 | 0.026613 | 0.013896 | 0 |
| 139 | 390000000.000000 | 0.040519 | 0.438287 | 0.338544 | 0.043065 | 0.039294 | 0 |
| 140 | 0.000239 | 0.037125 | 0.978178 | 0.319375 | 0.021613 | 0.007179 | 0 |
| 141 | 550000000.000000 | 0.158413 | 0.401325 | 0.323371 | 0.015806 | 0.005045 | 0 |
| 142 | 5000000.000000 | 0.043798 | 0.392348 | 0.312478 | 0.032258 | 0.039913 | 0 |
| 143 | 550000000.000000 | 0.045172 | 0.422842 | 0.343779 | 0.083871 | 0.033477 | 0 |
| 144 | 590000000.000000 | 0.053493 | 0.373714 | 0.311806 | 0.038710 | 0.108162 | 0 |
| 145 | 960000000.000000 | 0.039454 | 0.393150 | 0.324503 | 0.029516 | 0.020789 | 0 |
| 146 | 80000000.000000 | 0.066948 | 0.400669 | 0.338084 | 0.031935 | 0.038561 | 0 |
| 147 | 480000000.000000 | 0.074729 | 0.402151 | 0.327050 | 0.031935 | 0.030905 | 0 |
| 148 | 760000000.000000 | 0.026947 | 0.403745 | 0.314282 | 0.030323 | 0.018532 | 0 |
| 149 | 870000000.000000 | 0.034375 | 0.398284 | 0.328111 | 0.040484 | 0.023805 | 0 |
| 150 | 250000000.000000 | 0.097238 | 0.391138 | 0.334052 | 0.016129 | 0.008417 | 0 |
| 151 | 140000000.000000 | 0.022886 | 0.393817 | 0.310250 | 0.056452 | 0.010829 | 0 |
| 152 | 2000000.000000 | 0.044601 | 0.392619 | 0.321709 | 0.019032 | 0.008753 | 0 |
| 153 | 10000000.000000 | 0.044285 | 0.433301 | 0.364717 | 0.026129 | 0.032123 | 0 |
| 154 | 920000000.000000 | 0.073998 | 0.396667 | 0.320648 | 0.020645 | 0.024092 | 0 |
| 155 | 910000000.000000 | 0.027493 | 0.389159 | 0.325423 | 0.020323 | 0.003017 | 0 |
| 156 | 180000000.000000 | 0.016198 | 0.400036 | 0.255783 | 0.105323 | 0.031944 | 0 |
| 157 | 840000000.000000 | 0.027495 | 0.407001 | 0.319127 | 0.040323 | 0.021850 | 0 |
| 158 | 60000000.000000 | 0.034578 | 0.429705 | 0.346962 | 0.031129 | 0.027149 | 0 |
| 159 | 930000000.000000 | 0.012930 | 0.382386 | 0.351100 | 0.487097 | 0.042294 | 0 |
| 160 | 10000000.000000 | 0.038443 | 0.399652 | 0.342152 | 0.039355 | 0.036978 | 0 |
| 161 | 970000000.000000 | 0.037734 | 0.396848 | 0.320153 | 0.040484 | 0.020174 | 0 |
| 162 | 980000000.000000 | 0.040977 | 0.402625 | 0.318455 | 0.034839 | 0.027942 | 0 |
| 163 | 40000000.000000 | 0.042655 | 0.392268 | 0.323937 | 0.025323 | 0.071732 | 0 |
| 164 | 520000000.000000 | 0.066227 | 0.401359 | 0.330551 | 0.028710 | 0.040279 | 0 |
| 165 | 310000000.000000 | 0.093123 | 0.395152 | 0.322735 | 0.018387 | 0.015442 | 0 |
| 166 | 1000000.000000 | 0.038189 | 0.407567 | 0.326059 | 0.035484 | 0.020763 | 0 |
| 167 | 1000000.000000 | 0.128730 | 0.394485 | 0.326095 | 0.021129 | 0.023090 | 0 |
| 168 | 640000000.000000 | 0.063517 | 0.403700 | 0.338120 | 0.026935 | 0.043190 | 0 |
| 169 | 370000000.000000 | 0.045800 | 0.404333 | 0.328712 | 0.029677 | 0.025563 | 0 |
| 170 | 810000000.000000 | 0.159384 | 0.404570 | 0.322275 | 0.016774 | 0.009434 | 0 |
| 171 | 790000000.000000 | 0.015112 | 0.393365 | 0.348660 | 0.037742 | 0.007113 | 0 |
| 172 | 690000000.000000 | 0.120945 | 0.392201 | 0.318208 | 0.025645 | 0.024401 | 0 |
| 173 | 1000000.000000 | 0.068403 | 0.483108 | 0.328323 | 0.045323 | 0.021030 | 0 |
| 174 | 1000000.000000 | 0.015335 | 0.395887 | 0.313822 | 0.078065 | 0.023312 | 0 |
| 175 | 990000000.000000 | 0.051315 | 0.399030 | 0.333451 | 0.019355 | 0.025192 | 0 |
| 176 | 90000000.000000 | 0.033299 | 0.396135 | 0.320506 | 0.033710 | 0.020253 | 0 |
| 177 | 110000000.000000 | 0.047504 | 0.379786 | 0.311735 | 0.014032 | 0.072660 | 0 |
| 178 | 790000000.000000 | 0.079294 | 0.383144 | 0.318490 | 0.014677 | 0.003353 | 0 |
| 179 | 7000000.000000 | 0.107610 | 0.390459 | 0.318455 | 0.020968 | 0.009166 | 0 |
| 180 | 480000000.000000 | 0.041287 | 0.424662 | 0.321603 | 0.064677 | 0.008719 | 0 |
| 181 | 530000000.000000 | 0.042252 | 0.403858 | 0.304166 | 0.023548 | 0.024670 | 0 |
| 182 | 790000000.000000 | 0.041125 | 0.396113 | 0.331400 | 0.030000 | 0.021261 | 0 |
| 183 | 6000000.000000 | 0.038948 | 0.392935 | 0.324963 | 0.047581 | 0.025213 | 0 |
| 184 | 420000000.000000 | 0.036768 | 0.398747 | 0.344734 | 0.026774 | 0.086816 | 0 |
| 185 | 610000000.000000 | 0.030214 | 0.415877 | 0.357749 | 0.055484 | 0.063826 | 0 |
| 186 | 450000000.000000 | 0.046933 | 0.384139 | 0.314494 | 0.019194 | 0.090145 | 0 |
| 187 | 770000000.000000 | 0.045222 | 0.395672 | 0.382472 | 0.034839 | 0.044621 | 0 |
| 188 | 860000000.000000 | 0.095978 | 0.398171 | 0.323301 | 0.020806 | 0.018169 | 0 |
| 189 | 750000000.000000 | 0.050035 | 0.391567 | 0.307562 | 0.017419 | 0.006259 | 0 |
| 190 | 80000000.000000 | 0.041293 | 0.393229 | 0.327014 | 0.032903 | 0.070063 | 0 |
| 191 | 970000000.000000 | 0.029609 | 0.397436 | 0.319481 | 0.120645 | 0.034124 | 0 |
| 192 | 540000000.000000 | 0.048848 | 0.391013 | 0.317818 | 0.020968 | 0.022401 | 0 |
| 193 | 790000000.000000 | 0.034142 | 0.394665 | 0.323513 | 0.083710 | 0.145022 | 0 |
| 194 | 7000000.000000 | 0.014961 | 0.393817 | 0.311806 | 0.070161 | 0.151415 | 0 |
| 195 | 850000000.000000 | 0.047995 | 0.388786 | 0.312407 | 0.019516 | 0.107652 | 0 |
| 196 | 110000000.000000 | 0.058576 | 0.409568 | 0.317677 | 0.036290 | 0.028717 | 0 |
| 197 | 570000000.000000 | 0.059644 | 0.397153 | 0.324326 | 0.030806 | 0.037040 | 0 |
| 198 | 730000000.000000 | 0.075267 | 0.406289 | 0.321957 | 0.027742 | 0.021599 | 0 |
| 199 | 9000000.000000 | 0.050960 | 0.393546 | 0.322699 | 0.049355 | 0.024738 | 0 |
| 200 | 750000000.000000 | 0.058014 | 0.399211 | 0.320400 | 0.032581 | 0.026338 | 0 |
| 201 | 950000000.000000 | 0.124583 | 0.413955 | 0.345689 | 0.024516 | 0.015724 | 0 |
| 202 | 990000000.000000 | 0.034248 | 0.395265 | 0.333487 | 0.031774 | 0.017380 | 0 |
| 203 | 560000000.000000 | 0.029632 | 0.389849 | 0.323973 | 0.024032 | 0.049259 | 0 |
| 204 | 230000000.000000 | 0.063212 | 0.386502 | 0.318809 | 0.025323 | 0.065998 | 0 |
| 205 | 470000000.000000 | 0.014465 | 0.394846 | 0.343814 | 0.085161 | 0.074991 | 0 |
| 206 | 60000000.000000 | 0.025159 | 0.395943 | 0.315413 | 0.027419 | 0.009567 | 0 |
| 207 | 640000000.000000 | 0.058915 | 0.415572 | 0.351737 | 0.017097 | 0.013077 | 0 |
| 208 | 0.000000 | 0.285004 | 1.000000 | 0.341268 | 0.014032 | 0.000853 | 0 |
| 209 | 270000000.000000 | 0.057868 | 0.410337 | 0.330586 | 0.031774 | 0.024581 | 0 |
| 210 | 860000000.000000 | 0.027572 | 0.393682 | 0.333416 | 0.026774 | 0.007674 | 0 |
| 211 | 580000000.000000 | 0.172960 | 0.399166 | 0.327156 | 0.014516 | 0.010658 | 0 |
| 212 | 640000000.000000 | 0.020687 | 0.394247 | 0.308375 | 0.074032 | 0.023653 | 0 |
| 213 | 110000000.000000 | 0.087138 | 0.406956 | 0.321320 | 0.030645 | 0.019616 | 0 |
| 214 | 4000000.000000 | 0.040628 | 0.407555 | 0.331435 | 0.026290 | 0.016075 | 0 |
| 215 | 1000000.000000 | 0.022564 | 0.395061 | 0.320506 | 0.036613 | 0.019365 | 0 |
| 216 | 1.000000 | 0.094796 | 0.423374 | 0.351737 | 0.032581 | 0.009029 | 0 |
| 217 | 850000000.000000 | 0.032682 | 0.402241 | 0.321957 | 0.023226 | 0.010723 | 0 |
| 218 | 550000000.000000 | 0.037642 | 0.396949 | 0.333734 | 0.020645 | 0.005203 | 0 |
| 219 | 180000000.000000 | 0.036927 | 0.388164 | 0.297588 | 0.077742 | 0.463483 | 1 |
| 220 | 430000000.000000 | 0.058930 | 0.356934 | 0.316156 | 0.023065 | 0.108660 | 1 |
2 minority class 2 rows of specific column are imputed to median value 219 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Interest-bearing debt interest rate | Equity to Liability | Operating profit per person | Cash Flow Per Share | Net Worth Turnover Rate (times) | Total expense/Assets | |
|---|---|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.003946 | 0.000000 | 0.128740 | 0.009032 | 0.000895 |
| 0.010000 | 0.000000 | 0.014077 | 0.359396 | 0.282793 | 0.010806 | 0.004679 |
| 0.100000 | 0.000000 | 0.019159 | 0.387452 | 0.311771 | 0.017097 | 0.009465 |
| 0.250000 | 0.000200 | 0.024438 | 0.392449 | 0.317818 | 0.021935 | 0.014799 |
| 0.500000 | 0.000314 | 0.033580 | 0.395853 | 0.322505 | 0.029839 | 0.022799 |
| 0.750000 | 0.000499 | 0.052144 | 0.401712 | 0.328606 | 0.043065 | 0.035981 |
| 0.900000 | 0.000718 | 0.085175 | 0.413582 | 0.336917 | 0.064516 | 0.054118 |
| 0.990000 | 0.002153 | 0.221338 | 0.496057 | 0.367096 | 0.172250 | 0.121162 |
| 1.000000 | 0.796080 | 0.798122 | 0.927252 | 0.577386 | 0.510645 | 0.368382 |
[' Interest-bearing debt interest rate', ' Equity to Liability', ' Operating profit per person', ' Cash Flow Per Share', ' Net Worth Turnover Rate (times)', ' Total expense/Assets']
print_skewness_and_handle_outlier(4,7,bank_data_stage_1,'+',.8,True)
****Displaying first 5 rows of features that are having it skewness from 4 to 7****
| Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | |
|---|---|---|
| 0 | 0.147950 | 0.169141 |
| 1 | 0.182251 | 0.208944 |
| 2 | 0.177911 | 0.180581 |
| 3 | 0.154187 | 0.193722 |
| 4 | 0.167502 | 0.212537 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from 4 to 7 **** *******************************************************BEFORE*******************************************************
| Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | |
|---|---|---|
| 0.000000 | 0.069656 | 0.000000 |
| 0.010000 | 0.142050 | 0.164987 |
| 0.100000 | 0.162863 | 0.201002 |
| 0.250000 | 0.173613 | 0.214617 |
| 0.500000 | 0.184147 | 0.224355 |
| 0.750000 | 0.199191 | 0.238253 |
| 0.900000 | 0.223720 | 0.257928 |
| 0.990000 | 0.304567 | 0.329948 |
| 1.000000 | 0.549197 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.8 percentile **********
| Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | Bankrupt? | |
|---|---|---|---|
| 0 | 0.534280 | 1.000000 | 0 |
0 minority class 0 rows of specific column are imputed to median value 1 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | |
|---|---|---|
| 0.000000 | 0.069656 | 0.000000 |
| 0.010000 | 0.142050 | 0.164984 |
| 0.100000 | 0.162859 | 0.201002 |
| 0.250000 | 0.173613 | 0.214617 |
| 0.500000 | 0.184147 | 0.224355 |
| 0.750000 | 0.199191 | 0.238253 |
| 0.900000 | 0.223682 | 0.257918 |
| 0.990000 | 0.304087 | 0.329708 |
| 1.000000 | 0.549197 | 0.779522 |
[' Net Value Per Share (B)', ' Persistent EPS in the Last Four Seasons']
print_skewness_and_handle_outlier(-10,-2,bank_data_stage_1,'-',.01)
****Displaying first 5 rows of features that are having it skewness from -10 to -2****
| Operating Gross Margin | |
|---|---|
| 0 | 0.601457 |
| 1 | 0.610235 |
| 2 | 0.601450 |
| 3 | 0.583541 |
| 4 | 0.598783 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from -10 to -2 **** *******************************************************BEFORE*******************************************************
| Operating Gross Margin | |
|---|---|
| 0.000000 | 0.156308 |
| 0.010000 | 0.581603 |
| 0.100000 | 0.596519 |
| 0.250000 | 0.600398 |
| 0.500000 | 0.605897 |
| 0.750000 | 0.613593 |
| 0.900000 | 0.622794 |
| 0.990000 | 0.651947 |
| 1.000000 | 0.665151 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.01 percentile **********
| Operating Gross Margin | Bankrupt? |
|---|
0 minority class 0 rows of specific column are imputed to median value 0 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Operating Gross Margin | |
|---|---|
| 0.000000 | 0.156308 |
| 0.010000 | 0.581603 |
| 0.100000 | 0.596519 |
| 0.250000 | 0.600398 |
| 0.500000 | 0.605897 |
| 0.750000 | 0.613593 |
| 0.900000 | 0.622794 |
| 0.990000 | 0.651947 |
| 1.000000 | 0.665151 |
[' Operating Gross Margin']
print_skewness_and_handle_outlier(-20,-10,bank_data_stage_1,'-',.01,True)
****Displaying first 5 rows of features that are having it skewness from -20 to -10****
| Interest Expense Ratio | Interest Coverage Ratio (Interest expense to EBIT) | No-credit Interval | Retained Earnings to Total Assets | |
|---|---|---|---|---|
| 0 | 0.629951 | 0.564050 | 0.622879 | 0.903225 |
| 1 | 0.635172 | 0.570175 | 0.623652 | 0.931065 |
| 2 | 0.629631 | 0.563706 | 0.623841 | 0.909903 |
| 3 | 0.630228 | 0.564663 | 0.622929 | 0.906902 |
| 4 | 0.636055 | 0.575617 | 0.623521 | 0.913850 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from -20 to -10 **** *******************************************************BEFORE*******************************************************
| Interest Expense Ratio | Interest Coverage Ratio (Interest expense to EBIT) | No-credit Interval | Retained Earnings to Total Assets | |
|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.594171 |
| 0.010000 | 0.624162 | 0.554522 | 0.615306 | 0.857619 |
| 0.100000 | 0.630140 | 0.564537 | 0.622963 | 0.916745 |
| 0.250000 | 0.630612 | 0.565158 | 0.623634 | 0.931017 |
| 0.500000 | 0.630708 | 0.565268 | 0.623874 | 0.937619 |
| 0.750000 | 0.631151 | 0.565750 | 0.624153 | 0.944625 |
| 0.900000 | 0.632346 | 0.567043 | 0.624667 | 0.952788 |
| 0.990000 | 0.638543 | 0.573041 | 0.632929 | 0.970480 |
| 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.01 percentile **********
| Interest Expense Ratio | Interest Coverage Ratio (Interest expense to EBIT) | No-credit Interval | Retained Earnings to Total Assets | Bankrupt? | |
|---|---|---|---|---|---|
| 0 | 0.633188 | 0.567650 | 0.000000 | 0.926604 | 0 |
| 1 | 0.606654 | 0.000000 | 0.622948 | 0.940913 | 0 |
| 2 | 0.000000 | 0.707735 | 0.622525 | 0.928672 | 0 |
0 minority class 0 rows of specific column are imputed to median value 3 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Interest Expense Ratio | Interest Coverage Ratio (Interest expense to EBIT) | No-credit Interval | Retained Earnings to Total Assets | |
|---|---|---|---|---|
| 0.000000 | 0.459985 | 0.172065 | 0.408682 | 0.594171 |
| 0.010000 | 0.624199 | 0.554684 | 0.615353 | 0.857615 |
| 0.100000 | 0.630141 | 0.564537 | 0.622970 | 0.916733 |
| 0.250000 | 0.630612 | 0.565158 | 0.623635 | 0.931024 |
| 0.500000 | 0.630709 | 0.565268 | 0.623874 | 0.937620 |
| 0.750000 | 0.631151 | 0.565747 | 0.624153 | 0.944626 |
| 0.900000 | 0.632345 | 0.567040 | 0.624668 | 0.952794 |
| 0.990000 | 0.638549 | 0.572922 | 0.632929 | 0.970484 |
| 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
[' Interest Expense Ratio', ' Interest Coverage Ratio (Interest expense to EBIT)', ' No-credit Interval', ' Retained Earnings to Total Assets']
# Not removing any outliers Just skewness
print_skewness_and_handle_outlier(-40,-20,bank_data_stage_1,'-',.01,False)
# Final argument is False in the above function that means outlier removal will not happen hence table will be displayed for analysis
****Displaying first 5 rows of features that are having it skewness from -40 to -20****
| Net Income to Stockholder's Equity | Working Capital/Equity | Working capitcal Turnover Rate | After-tax Net Profit Growth Rate | |
|---|---|---|---|---|
| 0 | 0.827890 | 0.721275 | 0.593831 | 0.688979 |
| 1 | 0.839969 | 0.731975 | 0.593916 | 0.689693 |
| 2 | 0.836774 | 0.742729 | 0.594502 | 0.689463 |
| 3 | 0.834697 | 0.729825 | 0.593889 | 0.689110 |
| 4 | 0.839973 | 0.732000 | 0.593915 | 0.689697 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from -40 to -20 **** *******************************************************BEFORE*******************************************************
| Net Income to Stockholder's Equity | Working Capital/Equity | Working capitcal Turnover Rate | After-tax Net Profit Growth Rate | |
|---|---|---|---|---|
| 0.000000 | 0.000000 | 0.000000 | 0.572892 | 0.180701 |
| 0.010000 | 0.827363 | 0.723544 | 0.593802 | 0.678521 |
| 0.100000 | 0.837844 | 0.731788 | 0.593912 | 0.688885 |
| 0.250000 | 0.840105 | 0.733604 | 0.593934 | 0.689268 |
| 0.500000 | 0.841157 | 0.735983 | 0.593962 | 0.689438 |
| 0.750000 | 0.842323 | 0.738523 | 0.594000 | 0.689648 |
| 0.900000 | 0.843493 | 0.740703 | 0.594064 | 0.690009 |
| 0.990000 | 0.846347 | 0.743652 | 0.594440 | 0.696843 |
| 1.000000 | 1.000000 | 0.825197 | 0.605383 | 1.000000 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.01 percentile **********
| Net Income to Stockholder's Equity | Working Capital/Equity | Working capitcal Turnover Rate | After-tax Net Profit Growth Rate | Bankrupt? | |
|---|---|---|---|---|---|
| 0 | 0.442176 | 0.000000 | 0.593870 | 0.685937 | 1 |
| 1 | 0.000000 | 0.517571 | 0.593862 | 0.685068 | 1 |
2 minority class 2 rows of specific column are imputed to median value 0 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Net Income to Stockholder's Equity | Working Capital/Equity | Working capitcal Turnover Rate | After-tax Net Profit Growth Rate | |
|---|---|---|---|---|
| 0.000000 | 0.344652 | 0.507149 | 0.572892 | 0.180701 |
| 0.010000 | 0.827504 | 0.723756 | 0.593802 | 0.678521 |
| 0.100000 | 0.837850 | 0.731798 | 0.593912 | 0.688887 |
| 0.250000 | 0.840106 | 0.733609 | 0.593934 | 0.689268 |
| 0.500000 | 0.841159 | 0.735983 | 0.593962 | 0.689438 |
| 0.750000 | 0.842323 | 0.738524 | 0.594000 | 0.689648 |
| 0.900000 | 0.843493 | 0.740703 | 0.594064 | 0.690010 |
| 0.990000 | 0.846347 | 0.743653 | 0.594440 | 0.696846 |
| 1.000000 | 1.000000 | 0.825197 | 0.605383 | 1.000000 |
[" Net Income to Stockholder's Equity", ' Working Capital/Equity', ' Working capitcal Turnover Rate', ' After-tax Net Profit Growth Rate']
print_skewness_and_handle_outlier(-85,-50,bank_data_stage_1,'-',.01,True)
****Displaying first 5 rows of features that are having it skewness from -85 to -50****
| Operating Profit Growth Rate | Operating Profit Rate | |
|---|---|---|
| 0 | 0.848195 | 0.998969 |
| 1 | 0.848088 | 0.998946 |
| 2 | 0.848094 | 0.998857 |
| 3 | 0.848005 | 0.998700 |
| 4 | 0.848258 | 0.998973 |
**** Box plot before removal of outliers****
**** Displaying percentile of columns that are skewed from -85 to -50 **** *******************************************************BEFORE*******************************************************
| Operating Profit Growth Rate | Operating Profit Rate | |
|---|---|---|
| 0.000000 | 0.736430 | 0.973424 |
| 0.010000 | 0.846378 | 0.997955 |
| 0.100000 | 0.847886 | 0.998870 |
| 0.250000 | 0.847983 | 0.998970 |
| 0.500000 | 0.848043 | 0.999022 |
| 0.750000 | 0.848123 | 0.999093 |
| 0.900000 | 0.848285 | 0.999190 |
| 0.990000 | 0.851110 | 0.999399 |
| 1.000000 | 1.000000 | 0.999778 |
SKEWNESS *******************************************************BEFORE*******************************************************
**********Outliers beyond 0.01 percentile **********
| Operating Profit Growth Rate | Operating Profit Rate | Bankrupt? |
|---|
0 minority class 0 rows of specific column are imputed to median value 0 rows are removed SKEWNESS *******************************************************AFTER*******************************************************
**** Box plot after removal of outliers****
PERCENTILE *******************************************************AFTER*******************************************************
| Operating Profit Growth Rate | Operating Profit Rate | |
|---|---|---|
| 0.000000 | 0.736430 | 0.973424 |
| 0.010000 | 0.846378 | 0.997955 |
| 0.100000 | 0.847886 | 0.998870 |
| 0.250000 | 0.847983 | 0.998970 |
| 0.500000 | 0.848043 | 0.999022 |
| 0.750000 | 0.848123 | 0.999093 |
| 0.900000 | 0.848285 | 0.999190 |
| 0.990000 | 0.851110 | 0.999399 |
| 1.000000 | 1.000000 | 0.999778 |
[' Operating Profit Growth Rate', ' Operating Profit Rate']
X=bank_data_stage_1.drop('Bankrupt?',axis=1)
y=bank_data_stage_1["Bankrupt?"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,stratify=y)
bank_data_stage_2=X_train.copy()
bank_data_stage_2["Bankrupt?"]=y_train
# Checking the shape of the training and testing sets
X_train.shape,y_train.shape,X_test.shape,y_test.shape,bank_data_stage_2.shape
((5172, 73), (5172,), (1294, 73), (1294,), (5172, 74))
# checking the classes balance in the split dataset
y_test.value_counts(),y_train.value_counts()
(0 1250 1 44 Name: Bankrupt?, dtype: int64, 0 4996 1 176 Name: Bankrupt?, dtype: int64)
| Feature 1 | Feature 2 | Feature 3 | Feature 4 | Target | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.2 | 3.5 | .843 | .9.6 | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1.6 | 1.5 | 0.43 | .9.5 | 0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5.9 | 95 | 0.5 | .9.8 | 0 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3.5 | 9.6 | .3 | .0.23 | 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Features | Score | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Feature 1 | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 2 | 25 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 3 | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 4 | 15 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Features | Score | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Feature 2 | 25 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 4 | 15 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 3 | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Feature 1 | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| columns | score | p_value | |
|---|---|---|---|
| 0 | Retained Earnings to Total Assets | 403.773118 | 0.0000 |
| 1 | Debt ratio % | 382.439927 | 0.0000 |
| 2 | Net worth/Assets | 382.439927 | 0.0000 |
| 3 | ROA(C) before interest and depreciation befor... | 372.487581 | 0.0000 |
| 4 | Persistent EPS in the Last Four Seasons | 310.063059 | 0.0000 |
| 5 | Cash/Current Liability | 286.252677 | 0.0000 |
| 6 | Current Liability to Current Assets | 284.595750 | 0.0000 |
| 7 | Working Capital to Total Assets | 247.313865 | 0.0000 |
| 8 | Current Liability to Assets | 225.122244 | 0.0000 |
| 9 | Borrowing dependency | 205.120464 | 0.0000 |
| 10 | Net Income to Stockholder's Equity | 191.725273 | 0.0000 |
| 11 | Net Value Per Share (B) | 172.137867 | 0.0000 |
| 12 | Liability-Assets Flag | 146.025382 | 0.0000 |
| 13 | Equity to Long-term Liability | 145.974589 | 0.0000 |
| 14 | Working Capital/Equity | 128.954106 | 0.0000 |
| 15 | Non-industry income and expenditure/revenue | 110.983253 | 0.0000 |
| 16 | Total expense/Assets | 90.930228 | 0.0000 |
| 17 | Total income/Total expense | 85.334556 | 0.0000 |
| 18 | Operating Gross Margin | 74.137990 | 0.0000 |
| 19 | Operating profit per person | 64.427347 | 0.0000 |
| 20 | Tax rate (A) | 62.685816 | 0.0000 |
| 21 | Cash/Total Assets | 57.975673 | 0.0000 |
| 22 | Total assets to GNP price | 57.198236 | 0.0000 |
| 23 | Quick Assets/Total Assets | 52.781185 | 0.0000 |
| 24 | Current Ratio | 51.061844 | 0.0000 |
| 25 | Quick Assets/Current Liability | 45.741873 | 0.0000 |
| 26 | CFO to Assets | 43.589989 | 0.0000 |
| 27 | Equity to Liability | 42.928794 | 0.0000 |
| 28 | Fixed Assets Turnover Frequency | 39.380221 | 0.0000 |
| 29 | Inventory and accounts receivable/Net value | 38.909708 | 0.0000 |
| 30 | Operating Profit Rate | 38.850426 | 0.0000 |
| 31 | Cash flow rate | 36.832999 | 0.0000 |
| 32 | Total Asset Turnover | 34.540194 | 0.0000 |
| 33 | Contingent liabilities/Net worth | 32.300455 | 0.0000 |
| 34 | Average Collection Days | 28.537532 | 0.0000 |
| 35 | Fixed Assets to Assets | 28.537532 | 0.0000 |
| 36 | Total debt/Total net worth | 28.537532 | 0.0000 |
| 37 | Allocation rate per person | 28.537532 | 0.0000 |
| 38 | Accounts Receivable Turnover | 28.537532 | 0.0000 |
| 39 | Revenue per person | 28.537532 | 0.0000 |
| 40 | Net Value Growth Rate | 28.537532 | 0.0000 |
| 41 | Quick Ratio | 28.537532 | 0.0000 |
| 42 | Total Asset Return Growth Rate Ratio | 28.404717 | 0.0000 |
| 43 | Cash Flow Per Share | 26.275563 | 0.0000 |
| 44 | Working capitcal Turnover Rate | 25.605834 | 0.0000 |
| 45 | Operating Profit Growth Rate | 21.745893 | 0.0000 |
| 46 | Cash Flow to Total Assets | 20.771819 | 0.0000 |
| 47 | Current Assets/Total Assets | 19.982075 | 0.0000 |
| 48 | Revenue Per Share (Yuan ¥) | 18.624912 | 0.0000 |
| 49 | Cash Flow to Equity | 16.779115 | 0.0000 |
| 50 | Total Asset Growth Rate | 14.101149 | 0.0002 |
| 51 | Cash Reinvestment % | 11.383942 | 0.0007 |
| 52 | Cash Flow to Liability | 10.767865 | 0.0010 |
| 53 | After-tax Net Profit Growth Rate | 9.833110 | 0.0017 |
| 54 | Long-term fund suitability ratio (A) | 8.151000 | 0.0043 |
| 55 | Research and development expense rate | 6.270324 | 0.0123 |
| 56 | Cash Turnover Rate | 4.142295 | 0.0419 |
| 57 | Quick Asset Turnover Rate | 3.168890 | 0.0751 |
| 58 | Continuous Net Profit Growth Rate | 3.054471 | 0.0806 |
| 59 | Current Liabilities/Liability | 2.525012 | 0.1121 |
| 60 | Degree of Financial Leverage (DFL) | 1.969514 | 0.1606 |
| 61 | Current Asset Turnover Rate | 1.738746 | 0.1874 |
| 62 | Net Worth Turnover Rate (times) | 1.572778 | 0.2099 |
| 63 | Realized Sales Gross Profit Growth Rate | 1.315570 | 0.2514 |
| 64 | Interest Coverage Ratio (Interest expense to ... | 0.341999 | 0.5587 |
| 65 | Long-term Liability to Current Assets | 0.259598 | 0.6104 |
| 66 | Inventory/Current Liability | 0.187725 | 0.6648 |
| 67 | No-credit Interval | 0.147786 | 0.7007 |
| 68 | Interest-bearing debt interest rate | 0.066144 | 0.7970 |
| 69 | Operating Expense Rate | 0.035123 | 0.8513 |
| 70 | Inventory/Working Capital | 0.014130 | 0.9054 |
| 71 | Interest Expense Ratio | 0.010103 | 0.9199 |
| 72 | Inventory Turnover Rate (times) | 0.005834 | 0.9391 |
plt.figure(figsize=(12,5),dpi=200)
sns.barplot(data=feat_score.head(50),y='score',x='columns')
plt.xticks(rotation=90)
plt.show()
def gather_outliers(data,cols,show_table=True,IQR_const=[1.5,1.7,1.8,1.9,2,2.1,2.2,2.5,2.9,3]):
'''
data:dataframe
cols:features
show:it shows table if true (Default True)
IQR_const:IQR scale for outlier detection default value is [1.5,1.7,1.8,1.9,2,2.1,2.2,2.5,2.9,3]
TYPE:
data:dataframe
cols:list
show:bool
IQR_const:list
RETURNS:
This function returns a table of outliers if handled according to the
scale of IQR and calculates the total data lost due to imputation of
outliers with set of indices removed from the dataframe due to outlier handling
'''
outlier_table=pd.DataFrame()
# Initializing the dict object to store amount of outlier occurrence in the particular feature
outlier_dict={}
# Iterating over the IQR scales
for IQR_constant in IQR_const:
# Initializing the dict object Key related to the IQR scale to store amount of outlier occurrence in the particular feature
outlier_dict[f'{IQR_constant} * IQR']={}
# Initializing set to store the indices of the outliers indices for the IQR scale
total_indices=set()
# Iterating over the features
for i in cols:
#Initializing set to store outlier indices for the particular feature
outlier_index_per_feat=set()
# Calculating the first quantile
first_quantile=data[i].quantile(.25)
# Calculating the third quantile
third_quantile=data[i].quantile(.75)
# Calculating the IQR
IQR=third_quantile-first_quantile
# Calculating the lower bound
lower_limit=first_quantile - IQR*IQR_constant
# Calculating the upper bound
upper_limit=third_quantile + IQR*IQR_constant
# Filetering the dataframe based on the upper bound
upper_outliers=data.loc[data[i] > upper_limit,cols+['Bankrupt?']]
# Adding the indices of the upper outliers to the set
outlier_index_per_feat.update(upper_outliers.index)
total_indices.update(upper_outliers.index)
if show_table:
display(upper_outliers)
# Filetering the dataframe based on the lower bound
lower_outlier=data.loc[data[i] < lower_limit,cols+['Bankrupt?']]
outlier_index_per_feat.update(lower_outlier.index)
total_indices.update(lower_outlier.index)
if show_table:
display(lower_outlier)
# Storing percentage of outlier occurrence of the particular feature
outlier_dict[f'{IQR_constant} * IQR'][i]=np.round((len(outlier_index_per_feat)/data.shape[0])*100,2)
# Storing the total amount of outlier occurrence for the particular IQR scale
outlier_dict[f'{IQR_constant} * IQR']["Total"]=np.round((len(total_indices)/data.shape[0])*100,2)
return pd.DataFrame(outlier_dict),total_indices
# Calculating all the outliers occerence
outlier_indices=gather_outliers(bank_data_stage_2[bank_data_stage_2['Bankrupt?']==0],skewness_Range(-80,86),False)
# Sorting the first column of the outlier table ascendingly
outlier_indices[0].sort_values('1.5 * IQR',inplace=True)
ascendingly_order_outlier_occr_feat=outlier_indices[0].index
outlier_indices[0].style.bar()
| 1.5 * IQR | 1.7 * IQR | 1.8 * IQR | 1.9 * IQR | 2 * IQR | 2.1 * IQR | 2.2 * IQR | 2.5 * IQR | 2.9 * IQR | 3 * IQR | |
|---|---|---|---|---|---|---|---|---|---|---|
| Cash Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Operating Expense Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Asset Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Inventory Turnover Rate (times) | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Assets/Total Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Assets/Total Assets | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Debt ratio % | 0.120000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Net worth/Assets | 0.120000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liabilities/Liability | 0.500000 | 0.180000 | 0.060000 | 0.060000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Fixed Assets to Assets | 0.600000 | 0.200000 | 0.080000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital to Total Assets | 1.180000 | 0.460000 | 0.240000 | 0.140000 | 0.140000 | 0.060000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 |
| Current Liability to Assets | 1.300000 | 0.580000 | 0.520000 | 0.420000 | 0.240000 | 0.120000 | 0.060000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital/Equity | 1.440000 | 0.980000 | 0.960000 | 0.800000 | 0.700000 | 0.660000 | 0.620000 | 0.480000 | 0.300000 | 0.280000 |
| Tax rate (A) | 1.740000 | 1.500000 | 1.360000 | 1.260000 | 1.160000 | 1.000000 | 0.940000 | 0.660000 | 0.420000 | 0.380000 |
| Average Collection Days | 2.180000 | 1.600000 | 1.320000 | 1.060000 | 0.980000 | 0.900000 | 0.740000 | 0.560000 | 0.400000 | 0.360000 |
| Research and development expense rate | 2.580000 | 1.060000 | 0.520000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liability to Current Assets | 3.460000 | 2.820000 | 2.560000 | 2.380000 | 2.260000 | 2.120000 | 1.920000 | 1.620000 | 1.300000 | 1.280000 |
| Borrowing dependency | 3.460000 | 2.760000 | 2.480000 | 2.260000 | 2.120000 | 2.000000 | 1.740000 | 1.360000 | 1.020000 | 0.960000 |
| Interest-bearing debt interest rate | 3.600000 | 3.060000 | 2.760000 | 2.640000 | 2.420000 | 2.360000 | 2.320000 | 2.000000 | 1.720000 | 1.680000 |
| Operating Gross Margin | 4.620000 | 3.840000 | 3.540000 | 3.320000 | 3.060000 | 2.840000 | 2.580000 | 2.180000 | 1.360000 | 1.180000 |
| CFO to Assets | 4.760000 | 3.580000 | 3.100000 | 2.720000 | 2.360000 | 1.980000 | 1.640000 | 1.020000 | 0.700000 | 0.660000 |
| Total debt/Total net worth | 5.100000 | 4.200000 | 3.840000 | 3.560000 | 3.200000 | 3.020000 | 2.680000 | 2.020000 | 1.480000 | 1.360000 |
| Equity to Long-term Liability | 5.260000 | 4.260000 | 3.760000 | 3.440000 | 3.140000 | 2.820000 | 2.620000 | 2.060000 | 1.520000 | 1.480000 |
| ROA(C) before interest and depreciation before interest | 5.340000 | 3.920000 | 3.500000 | 2.960000 | 2.640000 | 2.160000 | 1.900000 | 1.300000 | 0.700000 | 0.700000 |
| Total expense/Assets | 5.340000 | 4.280000 | 3.860000 | 3.540000 | 3.200000 | 3.040000 | 2.880000 | 2.300000 | 1.740000 | 1.660000 |
| Total Asset Turnover | 5.420000 | 4.580000 | 4.120000 | 3.780000 | 3.460000 | 3.240000 | 2.860000 | 2.240000 | 1.620000 | 1.460000 |
| Inventory and accounts receivable/Net value | 5.480000 | 4.560000 | 4.280000 | 3.940000 | 3.680000 | 3.240000 | 3.000000 | 2.220000 | 1.580000 | 1.480000 |
| Inventory/Current Liability | 6.390000 | 5.480000 | 5.120000 | 4.720000 | 4.500000 | 4.320000 | 4.080000 | 3.620000 | 3.100000 | 3.020000 |
| Net Value Per Share (B) | 6.710000 | 5.580000 | 5.200000 | 4.800000 | 4.420000 | 4.080000 | 3.860000 | 3.200000 | 2.600000 | 2.480000 |
| Persistent EPS in the Last Four Seasons | 6.730000 | 5.460000 | 4.920000 | 4.460000 | 4.240000 | 3.840000 | 3.620000 | 2.760000 | 2.080000 | 1.940000 |
| Total income/Total expense | 6.990000 | 5.880000 | 5.440000 | 4.960000 | 4.540000 | 4.200000 | 4.000000 | 3.160000 | 2.560000 | 2.460000 |
| Revenue Per Share (Yuan ¥) | 7.070000 | 6.290000 | 5.820000 | 5.540000 | 5.120000 | 4.940000 | 4.720000 | 4.080000 | 3.300000 | 3.120000 |
| Net Income to Stockholder's Equity | 7.230000 | 6.160000 | 5.500000 | 4.760000 | 4.400000 | 3.980000 | 3.620000 | 2.860000 | 2.100000 | 2.040000 |
| Cash/Total Assets | 7.430000 | 6.530000 | 6.020000 | 5.500000 | 5.180000 | 4.860000 | 4.720000 | 3.680000 | 2.820000 | 2.680000 |
| Net Worth Turnover Rate (times) | 7.510000 | 6.690000 | 6.240000 | 5.800000 | 5.540000 | 5.420000 | 5.220000 | 4.300000 | 3.440000 | 3.240000 |
| Equity to Liability | 7.850000 | 6.870000 | 6.410000 | 6.020000 | 5.700000 | 5.280000 | 4.940000 | 4.220000 | 3.360000 | 3.180000 |
| Cash Flow Per Share | 8.010000 | 6.410000 | 5.920000 | 5.320000 | 4.820000 | 4.480000 | 4.220000 | 3.400000 | 2.740000 | 2.640000 |
| Working capitcal Turnover Rate | 8.030000 | 7.030000 | 6.570000 | 6.140000 | 5.800000 | 5.460000 | 5.200000 | 4.260000 | 3.480000 | 3.340000 |
| Cash flow rate | 8.330000 | 6.910000 | 6.290000 | 5.900000 | 5.700000 | 5.420000 | 5.040000 | 3.960000 | 2.780000 | 2.640000 |
| Long-term Liability to Current Assets | 8.470000 | 7.650000 | 7.330000 | 6.910000 | 6.590000 | 6.270000 | 5.860000 | 5.260000 | 4.500000 | 4.360000 |
| Current Ratio | 8.650000 | 7.730000 | 7.370000 | 6.990000 | 6.530000 | 6.180000 | 5.780000 | 4.840000 | 3.880000 | 3.700000 |
| Cash Reinvestment % | 8.730000 | 7.350000 | 6.610000 | 6.120000 | 5.620000 | 5.360000 | 4.880000 | 4.000000 | 2.940000 | 2.760000 |
| Quick Ratio | 8.750000 | 7.770000 | 7.170000 | 6.810000 | 6.470000 | 6.060000 | 5.840000 | 5.100000 | 4.200000 | 3.980000 |
| Retained Earnings to Total Assets | 8.790000 | 7.070000 | 6.370000 | 5.760000 | 5.240000 | 4.960000 | 4.500000 | 3.740000 | 3.060000 | 2.920000 |
| Quick Assets/Current Liability | 9.150000 | 8.050000 | 7.490000 | 7.130000 | 6.770000 | 6.390000 | 6.080000 | 5.380000 | 4.300000 | 4.080000 |
| Accounts Receivable Turnover | 9.250000 | 8.290000 | 7.910000 | 7.630000 | 7.250000 | 6.910000 | 6.710000 | 5.980000 | 5.080000 | 4.980000 |
| Allocation rate per person | 9.370000 | 8.390000 | 8.010000 | 7.530000 | 7.210000 | 6.930000 | 6.650000 | 5.720000 | 5.080000 | 4.920000 |
| Operating Profit Rate | 9.610000 | 7.850000 | 7.050000 | 6.510000 | 6.100000 | 5.500000 | 5.200000 | 4.200000 | 3.480000 | 3.180000 |
| Total Asset Return Growth Rate Ratio | 9.650000 | 8.010000 | 7.370000 | 6.770000 | 6.180000 | 5.640000 | 5.220000 | 4.380000 | 3.320000 | 3.100000 |
| Cash/Current Liability | 10.490000 | 9.650000 | 9.270000 | 8.930000 | 8.650000 | 8.430000 | 8.250000 | 7.470000 | 6.790000 | 6.590000 |
| Net Value Growth Rate | 10.610000 | 9.550000 | 8.970000 | 8.630000 | 8.230000 | 7.890000 | 7.490000 | 6.650000 | 5.400000 | 5.220000 |
| Revenue per person | 10.670000 | 10.010000 | 9.590000 | 9.230000 | 8.790000 | 8.470000 | 8.190000 | 7.190000 | 6.350000 | 6.120000 |
| Total assets to GNP price | 11.550000 | 10.710000 | 10.230000 | 9.830000 | 9.310000 | 8.970000 | 8.730000 | 8.190000 | 7.630000 | 7.510000 |
| Realized Sales Gross Profit Growth Rate | 11.810000 | 10.650000 | 10.230000 | 9.950000 | 9.630000 | 9.270000 | 8.870000 | 8.030000 | 7.070000 | 6.890000 |
| Operating profit per person | 11.850000 | 10.310000 | 9.910000 | 9.150000 | 8.710000 | 8.370000 | 7.790000 | 6.830000 | 5.960000 | 5.680000 |
| Long-term fund suitability ratio (A) | 11.930000 | 11.270000 | 10.870000 | 10.590000 | 10.310000 | 10.090000 | 9.810000 | 9.010000 | 8.090000 | 7.970000 |
| Cash Flow to Equity | 12.170000 | 10.590000 | 9.450000 | 8.730000 | 8.150000 | 7.610000 | 7.170000 | 5.780000 | 4.060000 | 3.860000 |
| Cash Flow to Total Assets | 12.510000 | 11.050000 | 10.170000 | 9.550000 | 8.870000 | 8.190000 | 7.690000 | 6.220000 | 4.840000 | 4.540000 |
| Inventory/Working Capital | 13.230000 | 11.810000 | 11.350000 | 10.910000 | 10.570000 | 10.130000 | 9.870000 | 8.810000 | 7.870000 | 7.610000 |
| Contingent liabilities/Net worth | 13.710000 | 12.550000 | 12.110000 | 11.550000 | 10.970000 | 10.370000 | 10.130000 | 9.190000 | 7.810000 | 7.590000 |
| After-tax Net Profit Growth Rate | 14.370000 | 13.330000 | 12.990000 | 12.490000 | 12.190000 | 11.710000 | 11.230000 | 10.350000 | 9.410000 | 9.190000 |
| Operating Profit Growth Rate | 14.390000 | 13.050000 | 12.470000 | 12.010000 | 11.630000 | 11.190000 | 10.750000 | 9.610000 | 8.750000 | 8.430000 |
| Continuous Net Profit Growth Rate | 14.570000 | 13.370000 | 12.850000 | 12.610000 | 12.210000 | 11.890000 | 11.530000 | 10.510000 | 9.310000 | 9.050000 |
| Non-industry income and expenditure/revenue | 15.390000 | 13.770000 | 13.110000 | 12.570000 | 11.990000 | 11.470000 | 11.030000 | 9.450000 | 7.930000 | 7.730000 |
| No-credit Interval | 16.390000 | 14.650000 | 14.090000 | 13.530000 | 13.050000 | 12.530000 | 12.030000 | 11.030000 | 9.750000 | 9.330000 |
| Cash Flow to Liability | 17.270000 | 15.810000 | 15.210000 | 14.510000 | 13.890000 | 13.210000 | 12.870000 | 11.470000 | 9.930000 | 9.590000 |
| Interest Expense Ratio | 17.690000 | 16.410000 | 15.810000 | 15.310000 | 14.790000 | 14.310000 | 13.850000 | 12.590000 | 11.050000 | 10.570000 |
| Interest Coverage Ratio (Interest expense to EBIT) | 18.800000 | 16.970000 | 16.150000 | 15.550000 | 14.990000 | 14.530000 | 14.050000 | 12.610000 | 11.090000 | 10.710000 |
| Total Asset Growth Rate | 19.680000 | 19.660000 | 18.470000 | 13.210000 | 12.270000 | 12.230000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Degree of Financial Leverage (DFL) | 20.020000 | 18.530000 | 18.050000 | 17.610000 | 17.090000 | 16.570000 | 16.050000 | 14.950000 | 13.470000 | 13.130000 |
| Current Asset Turnover Rate | 20.360000 | 20.140000 | 20.000000 | 19.880000 | 19.820000 | 19.740000 | 19.640000 | 19.440000 | 19.260000 | 19.240000 |
| Fixed Assets Turnover Frequency | 20.520000 | 20.240000 | 20.140000 | 20.020000 | 19.960000 | 19.860000 | 19.760000 | 19.500000 | 19.460000 | 19.420000 |
| Total | 92.790000 | 90.630000 | 89.790000 | 87.830000 | 86.930000 | 86.110000 | 84.390000 | 81.930000 | 78.340000 | 77.560000 |
outlier_indices[0].iloc[10:,].T.style.bar()
| Working Capital to Total Assets | Current Liability to Assets | Working Capital/Equity | Tax rate (A) | Average Collection Days | Research and development expense rate | Current Liability to Current Assets | Borrowing dependency | Interest-bearing debt interest rate | Operating Gross Margin | CFO to Assets | Total debt/Total net worth | Equity to Long-term Liability | ROA(C) before interest and depreciation before interest | Total expense/Assets | Total Asset Turnover | Inventory and accounts receivable/Net value | Inventory/Current Liability | Net Value Per Share (B) | Persistent EPS in the Last Four Seasons | Total income/Total expense | Revenue Per Share (Yuan ¥) | Net Income to Stockholder's Equity | Cash/Total Assets | Net Worth Turnover Rate (times) | Equity to Liability | Cash Flow Per Share | Working capitcal Turnover Rate | Cash flow rate | Long-term Liability to Current Assets | Current Ratio | Cash Reinvestment % | Quick Ratio | Retained Earnings to Total Assets | Quick Assets/Current Liability | Accounts Receivable Turnover | Allocation rate per person | Operating Profit Rate | Total Asset Return Growth Rate Ratio | Cash/Current Liability | Net Value Growth Rate | Revenue per person | Total assets to GNP price | Realized Sales Gross Profit Growth Rate | Operating profit per person | Long-term fund suitability ratio (A) | Cash Flow to Equity | Cash Flow to Total Assets | Inventory/Working Capital | Contingent liabilities/Net worth | After-tax Net Profit Growth Rate | Operating Profit Growth Rate | Continuous Net Profit Growth Rate | Non-industry income and expenditure/revenue | No-credit Interval | Cash Flow to Liability | Interest Expense Ratio | Interest Coverage Ratio (Interest expense to EBIT) | Total Asset Growth Rate | Degree of Financial Leverage (DFL) | Current Asset Turnover Rate | Fixed Assets Turnover Frequency | Total | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1.5 * IQR | 1.180000 | 1.300000 | 1.440000 | 1.740000 | 2.180000 | 2.580000 | 3.460000 | 3.460000 | 3.600000 | 4.620000 | 4.760000 | 5.100000 | 5.260000 | 5.340000 | 5.340000 | 5.420000 | 5.480000 | 6.390000 | 6.710000 | 6.730000 | 6.990000 | 7.070000 | 7.230000 | 7.430000 | 7.510000 | 7.850000 | 8.010000 | 8.030000 | 8.330000 | 8.470000 | 8.650000 | 8.730000 | 8.750000 | 8.790000 | 9.150000 | 9.250000 | 9.370000 | 9.610000 | 9.650000 | 10.490000 | 10.610000 | 10.670000 | 11.550000 | 11.810000 | 11.850000 | 11.930000 | 12.170000 | 12.510000 | 13.230000 | 13.710000 | 14.370000 | 14.390000 | 14.570000 | 15.390000 | 16.390000 | 17.270000 | 17.690000 | 18.800000 | 19.680000 | 20.020000 | 20.360000 | 20.520000 | 92.790000 |
| 1.7 * IQR | 0.460000 | 0.580000 | 0.980000 | 1.500000 | 1.600000 | 1.060000 | 2.820000 | 2.760000 | 3.060000 | 3.840000 | 3.580000 | 4.200000 | 4.260000 | 3.920000 | 4.280000 | 4.580000 | 4.560000 | 5.480000 | 5.580000 | 5.460000 | 5.880000 | 6.290000 | 6.160000 | 6.530000 | 6.690000 | 6.870000 | 6.410000 | 7.030000 | 6.910000 | 7.650000 | 7.730000 | 7.350000 | 7.770000 | 7.070000 | 8.050000 | 8.290000 | 8.390000 | 7.850000 | 8.010000 | 9.650000 | 9.550000 | 10.010000 | 10.710000 | 10.650000 | 10.310000 | 11.270000 | 10.590000 | 11.050000 | 11.810000 | 12.550000 | 13.330000 | 13.050000 | 13.370000 | 13.770000 | 14.650000 | 15.810000 | 16.410000 | 16.970000 | 19.660000 | 18.530000 | 20.140000 | 20.240000 | 90.630000 |
| 1.8 * IQR | 0.240000 | 0.520000 | 0.960000 | 1.360000 | 1.320000 | 0.520000 | 2.560000 | 2.480000 | 2.760000 | 3.540000 | 3.100000 | 3.840000 | 3.760000 | 3.500000 | 3.860000 | 4.120000 | 4.280000 | 5.120000 | 5.200000 | 4.920000 | 5.440000 | 5.820000 | 5.500000 | 6.020000 | 6.240000 | 6.410000 | 5.920000 | 6.570000 | 6.290000 | 7.330000 | 7.370000 | 6.610000 | 7.170000 | 6.370000 | 7.490000 | 7.910000 | 8.010000 | 7.050000 | 7.370000 | 9.270000 | 8.970000 | 9.590000 | 10.230000 | 10.230000 | 9.910000 | 10.870000 | 9.450000 | 10.170000 | 11.350000 | 12.110000 | 12.990000 | 12.470000 | 12.850000 | 13.110000 | 14.090000 | 15.210000 | 15.810000 | 16.150000 | 18.470000 | 18.050000 | 20.000000 | 20.140000 | 89.790000 |
| 1.9 * IQR | 0.140000 | 0.420000 | 0.800000 | 1.260000 | 1.060000 | 0.000000 | 2.380000 | 2.260000 | 2.640000 | 3.320000 | 2.720000 | 3.560000 | 3.440000 | 2.960000 | 3.540000 | 3.780000 | 3.940000 | 4.720000 | 4.800000 | 4.460000 | 4.960000 | 5.540000 | 4.760000 | 5.500000 | 5.800000 | 6.020000 | 5.320000 | 6.140000 | 5.900000 | 6.910000 | 6.990000 | 6.120000 | 6.810000 | 5.760000 | 7.130000 | 7.630000 | 7.530000 | 6.510000 | 6.770000 | 8.930000 | 8.630000 | 9.230000 | 9.830000 | 9.950000 | 9.150000 | 10.590000 | 8.730000 | 9.550000 | 10.910000 | 11.550000 | 12.490000 | 12.010000 | 12.610000 | 12.570000 | 13.530000 | 14.510000 | 15.310000 | 15.550000 | 13.210000 | 17.610000 | 19.880000 | 20.020000 | 87.830000 |
| 2 * IQR | 0.140000 | 0.240000 | 0.700000 | 1.160000 | 0.980000 | 0.000000 | 2.260000 | 2.120000 | 2.420000 | 3.060000 | 2.360000 | 3.200000 | 3.140000 | 2.640000 | 3.200000 | 3.460000 | 3.680000 | 4.500000 | 4.420000 | 4.240000 | 4.540000 | 5.120000 | 4.400000 | 5.180000 | 5.540000 | 5.700000 | 4.820000 | 5.800000 | 5.700000 | 6.590000 | 6.530000 | 5.620000 | 6.470000 | 5.240000 | 6.770000 | 7.250000 | 7.210000 | 6.100000 | 6.180000 | 8.650000 | 8.230000 | 8.790000 | 9.310000 | 9.630000 | 8.710000 | 10.310000 | 8.150000 | 8.870000 | 10.570000 | 10.970000 | 12.190000 | 11.630000 | 12.210000 | 11.990000 | 13.050000 | 13.890000 | 14.790000 | 14.990000 | 12.270000 | 17.090000 | 19.820000 | 19.960000 | 86.930000 |
| 2.1 * IQR | 0.060000 | 0.120000 | 0.660000 | 1.000000 | 0.900000 | 0.000000 | 2.120000 | 2.000000 | 2.360000 | 2.840000 | 1.980000 | 3.020000 | 2.820000 | 2.160000 | 3.040000 | 3.240000 | 3.240000 | 4.320000 | 4.080000 | 3.840000 | 4.200000 | 4.940000 | 3.980000 | 4.860000 | 5.420000 | 5.280000 | 4.480000 | 5.460000 | 5.420000 | 6.270000 | 6.180000 | 5.360000 | 6.060000 | 4.960000 | 6.390000 | 6.910000 | 6.930000 | 5.500000 | 5.640000 | 8.430000 | 7.890000 | 8.470000 | 8.970000 | 9.270000 | 8.370000 | 10.090000 | 7.610000 | 8.190000 | 10.130000 | 10.370000 | 11.710000 | 11.190000 | 11.890000 | 11.470000 | 12.530000 | 13.210000 | 14.310000 | 14.530000 | 12.230000 | 16.570000 | 19.740000 | 19.860000 | 86.110000 |
| 2.2 * IQR | 0.040000 | 0.060000 | 0.620000 | 0.940000 | 0.740000 | 0.000000 | 1.920000 | 1.740000 | 2.320000 | 2.580000 | 1.640000 | 2.680000 | 2.620000 | 1.900000 | 2.880000 | 2.860000 | 3.000000 | 4.080000 | 3.860000 | 3.620000 | 4.000000 | 4.720000 | 3.620000 | 4.720000 | 5.220000 | 4.940000 | 4.220000 | 5.200000 | 5.040000 | 5.860000 | 5.780000 | 4.880000 | 5.840000 | 4.500000 | 6.080000 | 6.710000 | 6.650000 | 5.200000 | 5.220000 | 8.250000 | 7.490000 | 8.190000 | 8.730000 | 8.870000 | 7.790000 | 9.810000 | 7.170000 | 7.690000 | 9.870000 | 10.130000 | 11.230000 | 10.750000 | 11.530000 | 11.030000 | 12.030000 | 12.870000 | 13.850000 | 14.050000 | 0.000000 | 16.050000 | 19.640000 | 19.760000 | 84.390000 |
| 2.5 * IQR | 0.020000 | 0.000000 | 0.480000 | 0.660000 | 0.560000 | 0.000000 | 1.620000 | 1.360000 | 2.000000 | 2.180000 | 1.020000 | 2.020000 | 2.060000 | 1.300000 | 2.300000 | 2.240000 | 2.220000 | 3.620000 | 3.200000 | 2.760000 | 3.160000 | 4.080000 | 2.860000 | 3.680000 | 4.300000 | 4.220000 | 3.400000 | 4.260000 | 3.960000 | 5.260000 | 4.840000 | 4.000000 | 5.100000 | 3.740000 | 5.380000 | 5.980000 | 5.720000 | 4.200000 | 4.380000 | 7.470000 | 6.650000 | 7.190000 | 8.190000 | 8.030000 | 6.830000 | 9.010000 | 5.780000 | 6.220000 | 8.810000 | 9.190000 | 10.350000 | 9.610000 | 10.510000 | 9.450000 | 11.030000 | 11.470000 | 12.590000 | 12.610000 | 0.000000 | 14.950000 | 19.440000 | 19.500000 | 81.930000 |
| 2.9 * IQR | 0.000000 | 0.000000 | 0.300000 | 0.420000 | 0.400000 | 0.000000 | 1.300000 | 1.020000 | 1.720000 | 1.360000 | 0.700000 | 1.480000 | 1.520000 | 0.700000 | 1.740000 | 1.620000 | 1.580000 | 3.100000 | 2.600000 | 2.080000 | 2.560000 | 3.300000 | 2.100000 | 2.820000 | 3.440000 | 3.360000 | 2.740000 | 3.480000 | 2.780000 | 4.500000 | 3.880000 | 2.940000 | 4.200000 | 3.060000 | 4.300000 | 5.080000 | 5.080000 | 3.480000 | 3.320000 | 6.790000 | 5.400000 | 6.350000 | 7.630000 | 7.070000 | 5.960000 | 8.090000 | 4.060000 | 4.840000 | 7.870000 | 7.810000 | 9.410000 | 8.750000 | 9.310000 | 7.930000 | 9.750000 | 9.930000 | 11.050000 | 11.090000 | 0.000000 | 13.470000 | 19.260000 | 19.460000 | 78.340000 |
| 3 * IQR | 0.000000 | 0.000000 | 0.280000 | 0.380000 | 0.360000 | 0.000000 | 1.280000 | 0.960000 | 1.680000 | 1.180000 | 0.660000 | 1.360000 | 1.480000 | 0.700000 | 1.660000 | 1.460000 | 1.480000 | 3.020000 | 2.480000 | 1.940000 | 2.460000 | 3.120000 | 2.040000 | 2.680000 | 3.240000 | 3.180000 | 2.640000 | 3.340000 | 2.640000 | 4.360000 | 3.700000 | 2.760000 | 3.980000 | 2.920000 | 4.080000 | 4.980000 | 4.920000 | 3.180000 | 3.100000 | 6.590000 | 5.220000 | 6.120000 | 7.510000 | 6.890000 | 5.680000 | 7.970000 | 3.860000 | 4.540000 | 7.610000 | 7.590000 | 9.190000 | 8.430000 | 9.050000 | 7.730000 | 9.330000 | 9.590000 | 10.570000 | 10.710000 | 0.000000 | 13.130000 | 19.240000 | 19.420000 | 77.560000 |
# Storing first 15 features that have less outlier occurrence
first_15_col=list(ascendingly_order_outlier_occr_feat[:15])
# Printing the table of outlier occurence of the first 15 features having low outlier occurrence
gather_outliers(bank_data_stage_2[bank_data_stage_2['Bankrupt?']==0],first_15_col,False)[0].style.bar()
| 1.5 * IQR | 1.7 * IQR | 1.8 * IQR | 1.9 * IQR | 2 * IQR | 2.1 * IQR | 2.2 * IQR | 2.5 * IQR | 2.9 * IQR | 3 * IQR | |
|---|---|---|---|---|---|---|---|---|---|---|
| Cash Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Operating Expense Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Asset Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Inventory Turnover Rate (times) | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Assets/Total Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Assets/Total Assets | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Debt ratio % | 0.120000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Net worth/Assets | 0.120000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liabilities/Liability | 0.500000 | 0.180000 | 0.060000 | 0.060000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Fixed Assets to Assets | 0.600000 | 0.200000 | 0.080000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital to Total Assets | 1.180000 | 0.460000 | 0.240000 | 0.140000 | 0.140000 | 0.060000 | 0.040000 | 0.020000 | 0.000000 | 0.000000 |
| Current Liability to Assets | 1.300000 | 0.580000 | 0.520000 | 0.420000 | 0.240000 | 0.120000 | 0.060000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital/Equity | 1.440000 | 0.980000 | 0.960000 | 0.800000 | 0.700000 | 0.660000 | 0.620000 | 0.480000 | 0.300000 | 0.280000 |
| Tax rate (A) | 1.740000 | 1.500000 | 1.360000 | 1.260000 | 1.160000 | 1.000000 | 0.940000 | 0.660000 | 0.420000 | 0.380000 |
| Average Collection Days | 2.180000 | 1.600000 | 1.320000 | 1.060000 | 0.980000 | 0.900000 | 0.740000 | 0.560000 | 0.400000 | 0.360000 |
| Total | 8.170000 | 5.080000 | 4.200000 | 3.460000 | 2.960000 | 2.600000 | 2.340000 | 1.700000 | 1.120000 | 1.020000 |
plt.figure(figsize=(12,10),dpi=200)
sns.lineplot(y=outlier_indices[0].loc["Total"].values,x=outlier_indices[0].columns)
graph_label_title("IQR scale to handle Outlier ",'Percentage of data lost','IQR Scale vs % of data Lost')
# outlier_indices[0].iloc[:,[1]].values
# CAUTION!!!! This function only imputes outliers of majority class
def impute_outliers(data,IQR_constant,cols):
'''
data:dataframe
IQR_constant:IQR scale for outlier detection
cols:features
TYPE:
data:dataframe
IQR_constant:float
cols:list
Returns:
This function returns the dataframe with imputed outliers
CAUTION!!!! it only imputes outliers of majority class
'''
# Separating dataframe according to the class (Minority class) or (Majority class)
condition_0=data["Bankrupt?"]==0
# Store the majority class
data_0=data[condition_0]
# Store the minority class
data_1=data[~ condition_0]
# Initializing the set to store the indices of the outliers
outlier_indices=set()
# Iterating over the features
for i in cols:
# Calculating the first quantile
first_quantile=data[i].quantile(.25)
# Calculating the third quantile
third_quantile=data[i].quantile(.75)
# Calculating the IQR
IQR=third_quantile-first_quantile
# Calculating the lower bound
lower_limit=first_quantile - IQR*IQR_constant
# Calculating the upper bound
upper_limit=third_quantile + IQR*IQR_constant
# Filetering the dataframe based on the upper bound
upper_outliers=data_0.loc[data_0[i] > upper_limit,cols]
# Adding the indices of the upper outliers to the set
outlier_indices.update(upper_outliers.index)
# Removing the outliers from dataframe based on the upper bound
data_0.drop(upper_outliers.index,inplace=True)
# Filetering the dataframe based on the lower bound
lower_outlier=data_0.loc[data_0[i] < lower_limit,cols]
# Adding the indices of the lower outliers to the set
outlier_indices.update(lower_outlier.index)
# Removing the outliers from dataframe based on the lower bound
data_0.drop(lower_outlier.index,inplace=True)
return pd.concat([data_0,data_1])
# Storing the first 33 outliers having less outlier occurrence
first_33_col=list(ascendingly_order_outlier_occr_feat[:33])
gather_outliers(bank_data_stage_2[bank_data_stage_2['Bankrupt?']==0],first_33_col,False,IQR_const=[2.2,2.5,2.9,3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4.0,4.1])[0].style.bar()
| 2.2 * IQR | 2.5 * IQR | 2.9 * IQR | 3 * IQR | 3.1 * IQR | 3.2 * IQR | 3.3 * IQR | 3.4 * IQR | 3.5 * IQR | 3.6 * IQR | 3.7 * IQR | 3.8 * IQR | 3.9 * IQR | 4.0 * IQR | 4.1 * IQR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cash Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Operating Expense Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Asset Turnover Rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Inventory Turnover Rate (times) | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Assets/Total Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Quick Assets/Total Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Debt ratio % | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Net worth/Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liabilities/Liability | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Fixed Assets to Assets | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital to Total Assets | 0.040000 | 0.020000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liability to Assets | 0.060000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Working Capital/Equity | 0.620000 | 0.480000 | 0.300000 | 0.280000 | 0.260000 | 0.240000 | 0.240000 | 0.220000 | 0.220000 | 0.220000 | 0.220000 | 0.220000 | 0.200000 | 0.200000 | 0.200000 |
| Tax rate (A) | 0.940000 | 0.660000 | 0.420000 | 0.380000 | 0.320000 | 0.260000 | 0.240000 | 0.240000 | 0.160000 | 0.100000 | 0.100000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Average Collection Days | 0.740000 | 0.560000 | 0.400000 | 0.360000 | 0.340000 | 0.280000 | 0.260000 | 0.260000 | 0.260000 | 0.220000 | 0.220000 | 0.200000 | 0.180000 | 0.160000 | 0.160000 |
| Research and development expense rate | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| Current Liability to Current Assets | 1.920000 | 1.620000 | 1.300000 | 1.280000 | 1.200000 | 1.140000 | 1.060000 | 1.000000 | 0.960000 | 0.860000 | 0.820000 | 0.740000 | 0.680000 | 0.660000 | 0.600000 |
| Borrowing dependency | 1.740000 | 1.360000 | 1.020000 | 0.960000 | 0.920000 | 0.860000 | 0.800000 | 0.780000 | 0.720000 | 0.680000 | 0.580000 | 0.560000 | 0.540000 | 0.520000 | 0.440000 |
| Interest-bearing debt interest rate | 2.320000 | 2.000000 | 1.720000 | 1.680000 | 1.680000 | 1.620000 | 1.600000 | 1.560000 | 1.520000 | 1.520000 | 1.500000 | 1.480000 | 1.460000 | 1.440000 | 1.360000 |
| Operating Gross Margin | 2.580000 | 2.180000 | 1.360000 | 1.180000 | 1.100000 | 1.040000 | 0.980000 | 0.820000 | 0.780000 | 0.660000 | 0.540000 | 0.460000 | 0.260000 | 0.240000 | 0.220000 |
| CFO to Assets | 1.640000 | 1.020000 | 0.700000 | 0.660000 | 0.580000 | 0.540000 | 0.540000 | 0.520000 | 0.500000 | 0.500000 | 0.460000 | 0.440000 | 0.420000 | 0.400000 | 0.400000 |
| Total debt/Total net worth | 2.680000 | 2.020000 | 1.480000 | 1.360000 | 1.320000 | 1.260000 | 1.180000 | 1.060000 | 1.000000 | 0.960000 | 0.920000 | 0.860000 | 0.800000 | 0.720000 | 0.680000 |
| Equity to Long-term Liability | 2.620000 | 2.060000 | 1.520000 | 1.480000 | 1.360000 | 1.340000 | 1.160000 | 1.100000 | 1.000000 | 0.920000 | 0.880000 | 0.820000 | 0.760000 | 0.720000 | 0.660000 |
| ROA(C) before interest and depreciation before interest | 1.900000 | 1.300000 | 0.700000 | 0.700000 | 0.600000 | 0.500000 | 0.420000 | 0.400000 | 0.300000 | 0.300000 | 0.280000 | 0.260000 | 0.220000 | 0.200000 | 0.200000 |
| Total expense/Assets | 2.880000 | 2.300000 | 1.740000 | 1.660000 | 1.540000 | 1.420000 | 1.340000 | 1.260000 | 1.160000 | 1.100000 | 1.020000 | 0.960000 | 0.880000 | 0.800000 | 0.780000 |
| Total Asset Turnover | 2.860000 | 2.240000 | 1.620000 | 1.460000 | 1.380000 | 1.280000 | 1.200000 | 1.100000 | 1.000000 | 0.980000 | 0.920000 | 0.900000 | 0.840000 | 0.720000 | 0.660000 |
| Inventory and accounts receivable/Net value | 3.000000 | 2.220000 | 1.580000 | 1.480000 | 1.380000 | 1.300000 | 1.220000 | 1.200000 | 1.100000 | 1.040000 | 0.960000 | 0.860000 | 0.760000 | 0.740000 | 0.640000 |
| Inventory/Current Liability | 4.080000 | 3.620000 | 3.100000 | 3.020000 | 2.940000 | 2.920000 | 2.800000 | 2.780000 | 2.720000 | 2.640000 | 2.580000 | 2.560000 | 2.500000 | 2.480000 | 2.480000 |
| Net Value Per Share (B) | 3.860000 | 3.200000 | 2.600000 | 2.480000 | 2.360000 | 2.220000 | 2.040000 | 1.980000 | 1.800000 | 1.700000 | 1.540000 | 1.500000 | 1.360000 | 1.240000 | 1.160000 |
| Persistent EPS in the Last Four Seasons | 3.620000 | 2.760000 | 2.080000 | 1.940000 | 1.880000 | 1.720000 | 1.580000 | 1.460000 | 1.380000 | 1.280000 | 1.220000 | 1.120000 | 1.080000 | 1.040000 | 0.960000 |
| Total income/Total expense | 4.000000 | 3.160000 | 2.560000 | 2.460000 | 2.260000 | 2.100000 | 2.040000 | 1.920000 | 1.820000 | 1.780000 | 1.760000 | 1.720000 | 1.660000 | 1.560000 | 1.460000 |
| Revenue Per Share (Yuan ¥) | 4.720000 | 4.080000 | 3.300000 | 3.120000 | 3.020000 | 2.880000 | 2.760000 | 2.680000 | 2.480000 | 2.420000 | 2.280000 | 2.080000 | 2.020000 | 1.920000 | 1.880000 |
| Net Income to Stockholder's Equity | 3.620000 | 2.860000 | 2.100000 | 2.040000 | 1.940000 | 1.860000 | 1.740000 | 1.600000 | 1.540000 | 1.420000 | 1.320000 | 1.280000 | 1.180000 | 1.120000 | 1.100000 |
| Total | 29.480000 | 24.700000 | 19.840000 | 18.920000 | 17.990000 | 17.230000 | 16.350000 | 15.650000 | 14.790000 | 13.950000 | 13.350000 | 12.650000 | 12.010000 | 11.510000 | 10.990000 |
def check_best_IQR_scale(model,sampling_method,IQR_range=[1.5,1.6,1.7,1.8,1.9,2.0,2.1,2.2,2.5]):
'''
model: estimator
sampling_method:Over_sampling or Over sampling Object
IQR_range: Range of IQR Scale
This function Iterates through the IQR scales and check the model performance and stores the performance
This function returns the model performance of all the IQR scales in DataFrame Object
'''
# Dictionary to store the model evaluation metrics of IQR scales
data_dict={}
# For loop to iterate through out the IQR scales supplied
for i in IQR_range:
# Store the Data frame that is filtered by outliers beyond the limit of IQR scale i.e for examples 1.5,1.6,1.7.. and so on
dataframe=impute_outliers(bank_data_stage_2,i,list(ascendingly_order_outlier_occr_feat[:33]))
# Stores the model performance of data which outliers are filtered with the IQR scale
data_dict[i]=get_model_performance(model,dataframe,sampling_method)
# returning the collection of model performance of all the range of IQR scales supplied
return pd.DataFrame(data_dict).T
def get_model_performance(model,dataframe,sampling_method,display_report=False,custom_featrues=False,return_trained_model=False):
'''
model: estimator
sampling_method:Over_sampling or Over sampling Object
display_report: To display the classification report or not
custom_featrues: To use the custom features or not
TYPE:
model: estimator object
dataframe: DataFrame Object
sampling_method: Over_sampling or Over sampling Object
display_report: boolean
custom_featrues: boolean or list of features
1. This function takes the model and dataframe and Over sampling
Object and returns the model performance
2. This function returns the model performance of the model in Dictionary Object
'''
# If custom features are not supplied then use the default features
X_train=dataframe.drop("Bankrupt?",axis=1)
y_train=dataframe["Bankrupt?"]
# If custom features are supplied then model uses the selected features
if custom_featrues != False:
X_test_=X_test[custom_featrues]
X_train=X_train[custom_featrues]
else:
X_test_=X_test
# Scalliing the data
scaller=StandardScaler()
X_train_=scaller.fit_transform(X_train)
X_train_,y_train_=sampling_method.fit_resample(X_train_,y_train)
X_test_=scaller.transform(X_test_)
# Fitting the model
model.fit(X_train_,y_train_)
# Predicting the model
predictions=model.predict(X_test_)
# Calculating the model performance
f1=f1_score(y_test,predictions)
pre=precision_score(y_test,predictions)
reca=recall_score(y_test,predictions)
acc=accuracy_score(y_test,predictions)
# Displaying the metrics
if display_report:
print(f1,'f1score')
print(acc,'accuracy')
print(pre,'precision')
print(reca,'recall')
print(classification_report(y_test,predictions,))
print(confusion_matrix(y_test,predictions,))
if return_trained_model:
return {'f1_score':f1,'precision':pre,'recall':reca,'accuracy':acc},model,X_test_
else :
return {'f1_score':f1,'precision':pre,'recall':reca,'accuracy':acc}
def print_metrics(predictions):
'''
predictions: predicted Data
TYPE:
predictions: numpy array
This function takes the predicted data and prints the metrics
'''
# Calculating the metrics
f1=f1_score(y_test,predictions)
pre=precision_score(y_test,predictions)
reca=recall_score(y_test,predictions)
acc=accuracy_score(y_test,predictions)
# Displaying the metrics
print(f1,'f1score')
print(acc,'accuracy')
print(pre,'precision')
print(reca,'recall')
print(classification_report(y_test,predictions,))
print(confusion_matrix(y_test,predictions))
return {'f1_score':f1,'precision':pre,'recall':reca,'accuracy':acc}
def plot_best_score(data,model_name):
print(data["f1_score"].max())
max_score=data[data["f1_score"]==data["f1_score"].max()]
display(data[data["f1_score"]==data["f1_score"].max()])
precision=max_score["precision"].values
recall=max_score["recall"].values
f1_score=max_score["f1_score"].values
max_index=max_score.index[0]
plt.figure(figsize=(10,5),dpi=200) # for better resolution
# plotting the metrics of model to visualize the performance
plt.plot(data.index,data["f1_score"],'y',label="fi_score", linewidth = '5')
plt.plot(data.index,data["recall"],'r',label="recall", linewidth = '5')
plt.plot(data.index,data["precision"],'g',label="precision", linewidth = '5')
# Highlighting the best part of curve
plt.axvline(x=max_index,label="Max IQR scale",color ="orange")
plt.axhline(y=f1_score,label="f1-score",color ="green")
plt.axhline(y=precision,label="precision",color='red')
plt.axhline(y=recall,label="recall",color='yellow')
graph_label_title('IQR Scale value',"scale for metrics ", model_name)
plt.legend()
plt.show()
# Displaying the model performance metrics of IQR scale range between 2.1 to 3.5
display(data.style.highlight_max())
# Tuning the model based on IQR scale and checking the best performance
chart_IQR_scale_tune_XGB=check_best_IQR_scale(XGBClassifier(),ADASYN(random_state=42,n_neighbors=5,n_jobs=-1),[2.0,2.2,2.3,2.5,3.3,3.4,3.7,3.8,3.9])
[12:37:16] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:37:23] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:37:30] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:37:38] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:37:45] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:37:53] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:38:02] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:38:12] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [12:38:21] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
plot_best_score(chart_IQR_scale_tune_XGB,"Tuning by IQR scale for XGboost Model")
0.5299145299145299
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 3.8 | 0.529915 | 0.424658 | 0.704545 | 0.957496 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 2.000000 | 0.402299 | 0.269231 | 0.795455 | 0.919629 |
| 2.200000 | 0.420382 | 0.292035 | 0.750000 | 0.929675 |
| 2.300000 | 0.412500 | 0.284483 | 0.750000 | 0.927357 |
| 2.500000 | 0.394737 | 0.277778 | 0.681818 | 0.928903 |
| 3.300000 | 0.462810 | 0.363636 | 0.636364 | 0.949768 |
| 3.400000 | 0.525424 | 0.418919 | 0.704545 | 0.956723 |
| 3.700000 | 0.512397 | 0.402597 | 0.704545 | 0.954405 |
| 3.800000 | 0.529915 | 0.424658 | 0.704545 | 0.957496 |
| 3.900000 | 0.521739 | 0.422535 | 0.681818 | 0.957496 |
# Storing the data treated with Outlier of IQR scale 3.8
data_xgb=impute_outliers(bank_data_stage_2,3.8,list(ascendingly_order_outlier_occr_feat[:33]))
# Calculating the overall performance of the model
get_model_performance(XGBClassifier(),
data_xgb,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1),
display_report=True)
[10:48:27] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
0.5299145299145299 f1score
0.9574961360123647 accuracy
0.4246575342465753 precision
0.7045454545454546 recall
precision recall f1-score support
0 0.99 0.97 0.98 1250
1 0.42 0.70 0.53 44
accuracy 0.96 1294
macro avg 0.71 0.84 0.75 1294
weighted avg 0.97 0.96 0.96 1294
[[1208 42]
[ 13 31]]
{'f1_score': 0.5299145299145299,
'precision': 0.4246575342465753,
'recall': 0.7045454545454546,
'accuracy': 0.9574961360123647}
(adding one by one higly ranked features from top and checking if there is a raise in model performance )
del f1_score
# Storing the data handled by outliers beyond IQR scale 3.8
data_xgb=impute_outliers(bank_data_stage_2,3.8,list(ascendingly_order_outlier_occr_feat[:33]))
# Storing the model performance metrics
feature_selection_tune_xgb={}
# Starting feature selection from 49th ranked feature to last ranked feature
features_=list(feat_score['columns'].values[:49])
# Iterating from 49th ranked feature to 55th ranked feature
for i in feat_score['columns'].values[49:55]:
# Adding the next ranked feature to the list of features
features_.append(i)
# Storing the model performance metrics for the current features set
feature_selection_tune_xgb[len(features_)]=get_model_performance(XGBClassifier(),
data_xgb,ADASYN(random_state=42,
n_neighbors=5
,n_jobs=-1),
display_report=False,
custom_featrues=features_)
[14:19:25] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [14:19:31] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [14:19:35] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [14:19:41] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [14:19:46] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior. [14:19:51] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
# Storing data stored in dictionary data in the form of dataframe
feature_selection_tune_xgb_=pd.DataFrame(feature_selection_tune_xgb).T
plot_best_score(feature_selection_tune_xgb_,"Tuning by Feature Selection for XGboost Model")
0.5454545454545454
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 52 | 0.545455 | 0.428571 | 0.75 | 0.957496 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 50 | 0.512000 | 0.395062 | 0.727273 | 0.952859 |
| 51 | 0.504065 | 0.392405 | 0.704545 | 0.952859 |
| 52 | 0.545455 | 0.428571 | 0.750000 | 0.957496 |
| 53 | 0.504202 | 0.400000 | 0.681818 | 0.954405 |
| 54 | 0.500000 | 0.402778 | 0.659091 | 0.955178 |
| 55 | 0.529915 | 0.424658 | 0.704545 | 0.957496 |
From above table we can conclude that selecting only first highly ranked 52 features out 73 features will increase our model performance :
From:
From this we can see that there is a gradual raise in F1-score and Recall score while the precision being constant
# Storing the features selected from rank 1 to rank 52
features_for_XgBoost=list(feat_score['columns'].values[:52])
# Storing the data treated with Outlier of IQR scale 3.8 and features selected from rank 1 to rank 52
data_xgb=impute_outliers(bank_data_stage_2,3.8,list(ascendingly_order_outlier_occr_feat[:33]))
# Checking the overall performance of the model
get_model_performance(XGBClassifier(),
data_xgb,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1),
display_report=True,
custom_featrues= features_for_XgBoost)
[15:50:51] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
0.5454545454545454 f1score
0.9574961360123647 accuracy
0.42857142857142855 precision
0.75 recall
precision recall f1-score support
0 0.99 0.96 0.98 1250
1 0.43 0.75 0.55 44
accuracy 0.96 1294
macro avg 0.71 0.86 0.76 1294
weighted avg 0.97 0.96 0.96 1294
[[1206 44]
[ 11 33]]
{'f1_score': 0.5454545454545454,
'precision': 0.42857142857142855,
'recall': 0.75,
'accuracy': 0.9574961360123647}
# Storing the data suitable for Xgboost model giving best performance
X_train=data_xgb[features_for_XgBoost]
y_train=data_xgb["Bankrupt?"]
# Scaling the data
scaller=StandardScaler()
X_train_=scaller.fit_transform(X_train)
X_test_=scaller.transform(X_test[features_for_XgBoost])
# OverSampling the data to balance the class
X_train_,y_train_=ADASYN(random_state=42,n_neighbors=5,n_jobs=-1).fit_resample(X_train_,y_train)
# Storing the parameters for Xgboost model for tuning
param_grid={'n_estimators':[100,200,300],'learning_rate':[0.2,0.3,0.4,0.5,0.6],'max_depth':[2,3,4,5,6],}
model=XGBClassifier()
# Creating an instance of the GridSearchCV with the parameters and the model
grid=RandomizedSearchCV(estimator=model,param_distributions= param_grid
,cv=5,verbose=1,
scoring='f1')
# Fitting the model
grid.fit(X_train_,y_train_)
print('\n Best Parameters\n',grid.best_params_)
# Storing the best parameters
model=grid.best_estimator_
# Printing the model evaluation metrics
print_metrics(predictions=model.predict(X_test_))
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[16:44:26] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:30] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:33] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:37] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:40] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:44] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:52] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:44:58] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:05] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:12] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:19] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:28] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:37] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:46] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:45:55] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:03] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:10] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:15] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:19] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:27] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:32] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:42] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:46:51] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:02] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:13] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:21] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:23] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:25] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:27] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:29] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:30] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:35] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:39] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:44] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:49] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:54] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:47:57] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:00] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:03] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:06] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:09] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:12] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:15] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:18] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:21] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:24] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:26] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:27] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:29] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:31] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
[16:48:33] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
Best Parameters
{'n_estimators': 300, 'max_depth': 5, 'learning_rate': 0.2}
0.5299145299145299 f1score
0.9574961360123647 accuracy
0.4246575342465753 precision
0.7045454545454546 recall
precision recall f1-score support
0 0.99 0.97 0.98 1250
1 0.42 0.70 0.53 44
accuracy 0.96 1294
macro avg 0.71 0.84 0.75 1294
weighted avg 0.97 0.96 0.96 1294
[[1208 42]
[ 13 31]]
{'f1_score': 0.5299145299145299,
'precision': 0.4246575342465753,
'recall': 0.7045454545454546,
'accuracy': 0.9574961360123647}
1.Feature selection method
2. IQR scale
3. Model Parameter Tuning
a. n_estimators:[100,200,300]
b. learning_rate:[0.2,0.3,0.4,0.5,0.6]
c. max_depth:[2,3,4,5,6]
1.Feature selection method
2. IQR scale
# Storing the perforamces of model of all values in range of IQR values from range 2.1 to 3.5
chart_IQR_scale_tune_RF=check_best_IQR_scale(RandomForestClassifier(),
ADASYN(random_state=42,n_neighbors=5,n_jobs=-1),
[2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,])
def plot_best_score(data,model_name):
print(data["f1_score"].max())
max_score=data[data["f1_score"]==data["f1_score"].max()]
display(data[data["f1_score"]==data["f1_score"].max()])
precision=max_score["precision"].values
recall=max_score["recall"].values
f1_score=max_score["f1_score"].values
max_index=max_score.index[0]
plt.figure(figsize=(10,5),dpi=200) # for better resolution
# plotting the metrics of model to visualize the performance
plt.plot(data.index,data["f1_score"],'y',label="fi_score", linewidth = '5')
plt.plot(data.index,data["recall"],'r',label="recall", linewidth = '5')
plt.plot(data.index,data["precision"],'g',label="precision", linewidth = '5')
# Highlighting the best part of curve
plt.axvline(x=max_index,label="Max IQR scale",color ="orange")
plt.axhline(y=f1_score,label="f1-score",color ="green")
plt.axhline(y=precision,label="precision",color='red')
plt.axhline(y=recall,label="recall",color='yellow')
graph_label_title('IQR Scale value',"scale for metrics ", model_name)
plt.legend()
plt.show()
# Displaying the model performance metrics of IQR scale range between 2.1 to 3.5
display(data.style.highlight_max())
plot_best_score(chart_IQR_scale_tune_RF,"Tuning by IQR scale for Random Forest model")
0.46357615894039733
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 2.4 | 0.463576 | 0.327103 | 0.795455 | 0.937403 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 2.100000 | 0.423077 | 0.294643 | 0.750000 | 0.930448 |
| 2.200000 | 0.438710 | 0.306306 | 0.772727 | 0.932767 |
| 2.300000 | 0.453333 | 0.320755 | 0.772727 | 0.936631 |
| 2.400000 | 0.463576 | 0.327103 | 0.795455 | 0.937403 |
| 2.500000 | 0.447552 | 0.323232 | 0.727273 | 0.938949 |
| 2.600000 | 0.455172 | 0.326733 | 0.750000 | 0.938949 |
| 2.700000 | 0.428571 | 0.312500 | 0.681818 | 0.938176 |
| 2.800000 | 0.421875 | 0.321429 | 0.613636 | 0.942813 |
| 2.900000 | 0.447761 | 0.333333 | 0.681818 | 0.942813 |
| 3.000000 | 0.449275 | 0.329787 | 0.704545 | 0.941267 |
# Storing best IQR scale value that gives best performance for Random Forest model
IQR_scale=2.4
# Storing the data treated with Outlier of IQR scale 2.5
data_rf=impute_outliers(bank_data_stage_2,IQR_scale,list(ascendingly_order_outlier_occr_feat[:33]))
# Checking the overall performance of the model with best IQR scale
get_model_performance(RandomForestClassifier(),
data_rf,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1)
,display_report=True)
0.44295302013422816 f1score
0.9358578052550232 accuracy
0.3142857142857143 precision
0.75 recall
precision recall f1-score support
0 0.99 0.94 0.97 1250
1 0.31 0.75 0.44 44
accuracy 0.94 1294
macro avg 0.65 0.85 0.70 1294
weighted avg 0.97 0.94 0.95 1294
[[1178 72]
[ 11 33]]
{'f1_score': 0.44295302013422816,
'precision': 0.3142857142857143,
'recall': 0.75,
'accuracy': 0.9358578052550232}
(adding one by one higly ranked features from top and checking if there is a raise in model performance )
# Storing the data handled by outliers beyond IQR scale 2.5
feature_selection_tune_rf={}
# Storing first 60 ranked features
features_=list(feat_score['columns'].values[:60])
# Iterating through the features from rank 55 to last ranked features
for i in feat_score['columns'].values[60:]:
# Adding next ranked feature to the feature list
features_.append(i)
# Storing the model performance metrics for the features
feature_selection_tune_rf[len(features_)]=get_model_performance(RandomForestClassifier()
,data_rf,ADASYN(
random_state=42,
n_neighbors=5,
n_jobs=-1),
display_report=False,
custom_featrues=features_)
feature_selection_tune_rf_=pd.DataFrame(feature_selection_tune_rf).T
plot_best_score(feature_selection_tune_rf_,"Tuning by Feature Sleection for Random Forest model")
0.45333333333333337
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 69 | 0.453333 | 0.320755 | 0.772727 | 0.936631 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 61 | 0.408163 | 0.291262 | 0.681818 | 0.932767 |
| 62 | 0.419580 | 0.303030 | 0.681818 | 0.935858 |
| 63 | 0.386667 | 0.273585 | 0.659091 | 0.928903 |
| 64 | 0.418919 | 0.298077 | 0.704545 | 0.933539 |
| 65 | 0.413793 | 0.297030 | 0.681818 | 0.934312 |
| 66 | 0.413793 | 0.297030 | 0.681818 | 0.934312 |
| 67 | 0.434211 | 0.305556 | 0.750000 | 0.933539 |
| 68 | 0.432432 | 0.307692 | 0.727273 | 0.935085 |
| 69 | 0.453333 | 0.320755 | 0.772727 | 0.936631 |
| 70 | 0.410959 | 0.294118 | 0.681818 | 0.933539 |
| 71 | 0.438356 | 0.313725 | 0.727273 | 0.936631 |
| 72 | 0.427586 | 0.306931 | 0.704545 | 0.935858 |
| 73 | 0.438356 | 0.313725 | 0.727273 | 0.936631 |
There is no much increase in the performance of model by from feature selection method
CAUTION Performance of XGBoost fluctuates a lot for the same value of parameters
features_for_Rf=list(feat_score['columns'].values[:69])
get_model_performance(RandomForestClassifier(),
data_rf,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1)
,display_report=True,custom_featrues=features_for_Rf)
0.45070422535211263 f1score
0.9397217928902627 accuracy
0.32653061224489793 precision
0.7272727272727273 recall
precision recall f1-score support
0 0.99 0.95 0.97 1250
1 0.33 0.73 0.45 44
accuracy 0.94 1294
macro avg 0.66 0.84 0.71 1294
weighted avg 0.97 0.94 0.95 1294
[[1184 66]
[ 12 32]]
{'f1_score': 0.45070422535211263,
'precision': 0.32653061224489793,
'recall': 0.7272727272727273,
'accuracy': 0.9397217928902627}
# Storing Xtrain set and ytrain set for Random Forest model
X_train=data_rf.drop("Bankrupt?",axis=1)
y_train=data_rf["Bankrupt?"]
# Scalining the data
scaller=StandardScaler()
X_train_=scaller.fit_transform(X_train)
X_test_=scaller.transform(X_test)
# OverSampling the data
X_train_,y_train_=ADASYN(random_state=42,n_neighbors=5,n_jobs=-1).fit_resample(X_train_,y_train)
# Storing the best parameters for tuning Random Forest model
param_grid={'bootstrap': [True, False],
'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [1, 2, 4],
'min_samples_split': [2, 5, 10],
'n_estimators': [30,40,50,70,80,100,1400, 1600, 1800, 2000]}
# Initializing the Random Forest model
model=RandomForestClassifier()
# Initializing the Randomised Search CV object
grid=RandomizedSearchCV(estimator=model,param_distributions= param_grid
,cv=5,verbose=10,
scoring='neg_mean_absolute_error')
# Fitting the model
grid.fit(X_train_,y_train_)
print(grid.best_params_)
# Storing the best parameters for Random Forest model
model=grid.best_estimator_
# Printing the best parameters for Random Forest model
print_metrics(predictions=model.predict(X_test_))
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5; 1/10] START bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40
[CV 1/5; 1/10] END bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40;, score=-0.061 total time= 1.7s
[CV 2/5; 1/10] START bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40
[CV 2/5; 1/10] END bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40;, score=-0.044 total time= 1.7s
[CV 3/5; 1/10] START bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40
[CV 3/5; 1/10] END bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40;, score=-0.051 total time= 2.2s
[CV 4/5; 1/10] START bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40
[CV 4/5; 1/10] END bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40;, score=-0.098 total time= 1.8s
[CV 5/5; 1/10] START bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40
[CV 5/5; 1/10] END bootstrap=False, max_depth=60, max_features=sqrt, min_samples_leaf=4, min_samples_split=2, n_estimators=40;, score=-0.038 total time= 2.6s
[CV 1/5; 2/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40
[CV 1/5; 2/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40;, score=-0.054 total time= 1.8s
[CV 2/5; 2/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40
[CV 2/5; 2/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40;, score=-0.051 total time= 1.3s
[CV 3/5; 2/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40
[CV 3/5; 2/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40;, score=-0.052 total time= 1.1s
[CV 4/5; 2/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40
[CV 4/5; 2/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40;, score=-0.100 total time= 1.2s
[CV 5/5; 2/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40
[CV 5/5; 2/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=40;, score=-0.042 total time= 1.1s
[CV 1/5; 3/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70
[CV 1/5; 3/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70;, score=-0.063 total time= 2.1s
[CV 2/5; 3/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70
[CV 2/5; 3/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70;, score=-0.043 total time= 2.8s
[CV 3/5; 3/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70
[CV 3/5; 3/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70;, score=-0.049 total time= 2.5s
[CV 4/5; 3/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70
[CV 4/5; 3/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70;, score=-0.086 total time= 2.7s
[CV 5/5; 3/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70
[CV 5/5; 3/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=1, min_samples_split=2, n_estimators=70;, score=-0.039 total time= 2.2s
[CV 1/5; 4/10] START bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40
[CV 1/5; 4/10] END bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40;, score=-0.065 total time= 1.1s
[CV 2/5; 4/10] START bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40
[CV 2/5; 4/10] END bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40;, score=-0.047 total time= 1.2s
[CV 3/5; 4/10] START bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40
[CV 3/5; 4/10] END bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40;, score=-0.052 total time= 1.4s
[CV 4/5; 4/10] START bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40
[CV 4/5; 4/10] END bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40;, score=-0.096 total time= 1.5s
[CV 5/5; 4/10] START bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40
[CV 5/5; 4/10] END bootstrap=True, max_depth=40, max_features=auto, min_samples_leaf=2, min_samples_split=2, n_estimators=40;, score=-0.036 total time= 2.7s
[CV 1/5; 5/10] START bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30
[CV 1/5; 5/10] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30;, score=-0.058 total time= 3.2s
[CV 2/5; 5/10] START bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30
[CV 2/5; 5/10] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30;, score=-0.048 total time= 1.9s
[CV 3/5; 5/10] START bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30
[CV 3/5; 5/10] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30;, score=-0.051 total time= 2.5s
[CV 4/5; 5/10] START bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30
[CV 4/5; 5/10] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30;, score=-0.084 total time= 1.6s
[CV 5/5; 5/10] START bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30
[CV 5/5; 5/10] END bootstrap=False, max_depth=None, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=30;, score=-0.038 total time= 1.3s
[CV 1/5; 6/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30
[CV 1/5; 6/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30;, score=-0.057 total time= 1.3s
[CV 2/5; 6/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30
[CV 2/5; 6/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30;, score=-0.049 total time= 0.7s
[CV 3/5; 6/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30
[CV 3/5; 6/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30;, score=-0.056 total time= 0.9s
[CV 4/5; 6/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30
[CV 4/5; 6/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30;, score=-0.093 total time= 0.8s
[CV 5/5; 6/10] START bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30
[CV 5/5; 6/10] END bootstrap=True, max_depth=60, max_features=auto, min_samples_leaf=4, min_samples_split=2, n_estimators=30;, score=-0.044 total time= 0.9s
[CV 1/5; 7/10] START bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30
[CV 1/5; 7/10] END bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30;, score=-0.062 total time= 1.1s
[CV 2/5; 7/10] START bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30
[CV 2/5; 7/10] END bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30;, score=-0.049 total time= 1.2s
[CV 3/5; 7/10] START bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30
[CV 3/5; 7/10] END bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30;, score=-0.050 total time= 0.8s
[CV 4/5; 7/10] START bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30
[CV 4/5; 7/10] END bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30;, score=-0.093 total time= 0.9s
[CV 5/5; 7/10] START bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30
[CV 5/5; 7/10] END bootstrap=True, max_depth=40, max_features=sqrt, min_samples_leaf=1, min_samples_split=10, n_estimators=30;, score=-0.042 total time= 0.9s
[CV 1/5; 8/10] START bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600
[CV 1/5; 8/10] END bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600;, score=-0.058 total time= 1.5min
[CV 2/5; 8/10] START bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600
[CV 2/5; 8/10] END bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600;, score=-0.037 total time= 1.6min
[CV 3/5; 8/10] START bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600
[CV 3/5; 8/10] END bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600;, score=-0.042 total time= 1.4min
[CV 4/5; 8/10] START bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600
[CV 4/5; 8/10] END bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600;, score=-0.088 total time= 1.4min
[CV 5/5; 8/10] START bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600
[CV 5/5; 8/10] END bootstrap=False, max_depth=40, max_features=auto, min_samples_leaf=1, min_samples_split=5, n_estimators=1600;, score=-0.032 total time= 1.3min
[CV 1/5; 9/10] START bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100
[CV 1/5; 9/10] END bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100;, score=-0.058 total time= 2.9s
[CV 2/5; 9/10] START bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100
[CV 2/5; 9/10] END bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100;, score=-0.047 total time= 2.8s
[CV 3/5; 9/10] START bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100
[CV 3/5; 9/10] END bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100;, score=-0.050 total time= 2.8s
[CV 4/5; 9/10] START bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100
[CV 4/5; 9/10] END bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100;, score=-0.087 total time= 2.7s
[CV 5/5; 9/10] START bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100
[CV 5/5; 9/10] END bootstrap=True, max_depth=70, max_features=auto, min_samples_leaf=2, min_samples_split=5, n_estimators=100;, score=-0.037 total time= 2.9s
[CV 1/5; 10/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800
[CV 1/5; 10/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800;, score=-0.062 total time= 53.0s
[CV 2/5; 10/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800
[CV 2/5; 10/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800;, score=-0.048 total time= 50.5s
[CV 3/5; 10/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800
[CV 3/5; 10/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800;, score=-0.054 total time= 51.2s
[CV 4/5; 10/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800
[CV 4/5; 10/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800;, score=-0.094 total time= 50.6s
[CV 5/5; 10/10] START bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800
[CV 5/5; 10/10] END bootstrap=True, max_depth=20, max_features=sqrt, min_samples_leaf=4, min_samples_split=5, n_estimators=1800;, score=-0.035 total time= 53.2s
{'n_estimators': 1600, 'min_samples_split': 5, 'min_samples_leaf': 1, 'max_features': 'auto', 'max_depth': 40, 'bootstrap': False}
0.4477611940298507 f1score
0.9428129829984544 accuracy
0.3333333333333333 precision
0.6818181818181818 recall
precision recall f1-score support
0 0.99 0.95 0.97 1250
1 0.33 0.68 0.45 44
accuracy 0.94 1294
macro avg 0.66 0.82 0.71 1294
weighted avg 0.97 0.94 0.95 1294
[[1190 60]
[ 14 30]]
{'f1_score': 0.4477611940298507,
'precision': 0.3333333333333333,
'recall': 0.6818181818181818,
'accuracy': 0.9428129829984544}
1.Feature selection method
2. IQR scale
3. Model Parameter Tuning
a. bootstrap: [True, False]
b. max_depth: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None]
c. max_features: ['auto', 'sqrt']
d. min_samples_leaf: [1, 2, 4]
e. min_samples_split: [2, 5, 10]
f. n_estimators: [30,40,50,70,80,100,1400, 1600, 1800, 2000]
Score of Random Forest tuned model is by
1.Feature selection method
2. IQR scale
# Storing the model performance for Random Forest model for IQR range of 1.5 to 3.9
chart_IQR_scale_tune_GDBoost=check_best_IQR_scale(GradientBoostingClassifier()
,ADASYN(random_state=42,
n_neighbors=5,
n_jobs=-1),
[2.3,2.4,2.5,2.6,2.8
,3.0,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4,4.1,4.2])
plot_best_score(chart_IQR_scale_tune_GDBoost,"Tuning by IQR scale for Gradient Boost model")
0.4
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 3.6 | 0.4 | 0.262411 | 0.840909 | 0.914219 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 2.300000 | 0.353488 | 0.222222 | 0.863636 | 0.892581 |
| 2.400000 | 0.371859 | 0.238710 | 0.840909 | 0.903400 |
| 2.500000 | 0.391960 | 0.251613 | 0.886364 | 0.906491 |
| 2.600000 | 0.371134 | 0.240000 | 0.818182 | 0.905719 |
| 2.800000 | 0.385027 | 0.251748 | 0.818182 | 0.911128 |
| 3.000000 | 0.365482 | 0.235294 | 0.818182 | 0.903400 |
| 3.100000 | 0.372340 | 0.243056 | 0.795455 | 0.908810 |
| 3.200000 | 0.358696 | 0.235714 | 0.750000 | 0.908810 |
| 3.300000 | 0.385027 | 0.251748 | 0.818182 | 0.911128 |
| 3.400000 | 0.376344 | 0.246479 | 0.795455 | 0.910355 |
| 3.500000 | 0.391304 | 0.257143 | 0.818182 | 0.913447 |
| 3.600000 | 0.400000 | 0.262411 | 0.840909 | 0.914219 |
| 3.700000 | 0.382979 | 0.250000 | 0.818182 | 0.910355 |
| 3.800000 | 0.400000 | 0.264706 | 0.818182 | 0.916538 |
| 3.900000 | 0.384615 | 0.253623 | 0.795455 | 0.913447 |
| 4.000000 | 0.393443 | 0.258993 | 0.818182 | 0.914219 |
| 4.100000 | 0.382514 | 0.251799 | 0.795455 | 0.912674 |
| 4.200000 | 0.345946 | 0.226950 | 0.727273 | 0.906491 |
# Storing Optimum value of IQR scale for Gradient Boost model
IQR_scale=3.6
# Storing the data after removing outlier beyond IQR scale i.e 3.7
data_GBoost=impute_outliers(bank_data_stage_2,IQR_scale,list(ascendingly_order_outlier_occr_feat[:33]))
#
get_model_performance(GradientBoostingClassifier(),
data_GBoost,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1)
,display_report=True)
0.4 f1score
0.9142194744976816 accuracy
0.2624113475177305 precision
0.8409090909090909 recall
precision recall f1-score support
0 0.99 0.92 0.95 1250
1 0.26 0.84 0.40 44
accuracy 0.91 1294
macro avg 0.63 0.88 0.68 1294
weighted avg 0.97 0.91 0.93 1294
[[1146 104]
[ 7 37]]
{'f1_score': 0.4,
'precision': 0.2624113475177305,
'recall': 0.8409090909090909,
'accuracy': 0.9142194744976816}
# Initializing dictionary object to store the model performance metrics
feature_selection_tune_GBBoost={}
# Storing the first 50 highest ranked features
features_=list(feat_score['columns'].values[:50])
# Iterating from 50th highest ranked features to last ranked features
for i in feat_score['columns'].values[50:]:
features_.append(i)
feature_selection_tune_GBBoost[len(features_)]=get_model_performance(GradientBoostingClassifier()
,data_GBoost,ADASYN(
random_state=42,
n_neighbors=5,
n_jobs=-1),
display_report=False,
custom_featrues=features_)
pd.DataFrame(feature_selection_tune_GBBoost).T.style.highlight_max()
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 51 | 0.359447 | 0.225434 | 0.886364 | 0.892581 |
| 52 | 0.356808 | 0.224852 | 0.863636 | 0.894127 |
| 53 | 0.370732 | 0.236025 | 0.863636 | 0.900309 |
| 54 | 0.362745 | 0.231250 | 0.840909 | 0.899536 |
| 55 | 0.360976 | 0.229814 | 0.840909 | 0.898764 |
| 56 | 0.378109 | 0.242038 | 0.863636 | 0.903400 |
| 57 | 0.383420 | 0.248322 | 0.840909 | 0.908037 |
| 58 | 0.362637 | 0.239130 | 0.750000 | 0.910355 |
| 59 | 0.368715 | 0.244444 | 0.750000 | 0.912674 |
| 60 | 0.375691 | 0.248175 | 0.772727 | 0.912674 |
| 61 | 0.378378 | 0.248227 | 0.795455 | 0.911128 |
| 62 | 0.333333 | 0.218310 | 0.704545 | 0.904173 |
| 63 | 0.349206 | 0.227586 | 0.750000 | 0.904946 |
| 64 | 0.353591 | 0.233577 | 0.727273 | 0.909583 |
| 65 | 0.364641 | 0.240876 | 0.750000 | 0.911128 |
| 66 | 0.362637 | 0.239130 | 0.750000 | 0.910355 |
| 67 | 0.380435 | 0.250000 | 0.795455 | 0.911901 |
| 68 | 0.378378 | 0.248227 | 0.795455 | 0.911128 |
| 69 | 0.395604 | 0.260870 | 0.818182 | 0.914992 |
| 70 | 0.390805 | 0.261538 | 0.772727 | 0.918083 |
| 71 | 0.386740 | 0.255474 | 0.795455 | 0.914219 |
| 72 | 0.385027 | 0.251748 | 0.818182 | 0.911128 |
| 73 | 0.400000 | 0.262411 | 0.840909 | 0.914219 |
feature_selection_tune_GBBoost_=pd.DataFrame(feature_selection_tune_GBBoost).T
plot_best_score(feature_selection_tune_GBBoost_,"Tuning by IQR scale for Gradient Boost model")
0.4
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 73 | 0.4 | 0.262411 | 0.840909 | 0.914219 |
| f1_score | precision | recall | accuracy | |
|---|---|---|---|---|
| 51 | 0.359447 | 0.225434 | 0.886364 | 0.892581 |
| 52 | 0.356808 | 0.224852 | 0.863636 | 0.894127 |
| 53 | 0.370732 | 0.236025 | 0.863636 | 0.900309 |
| 54 | 0.362745 | 0.231250 | 0.840909 | 0.899536 |
| 55 | 0.360976 | 0.229814 | 0.840909 | 0.898764 |
| 56 | 0.378109 | 0.242038 | 0.863636 | 0.903400 |
| 57 | 0.383420 | 0.248322 | 0.840909 | 0.908037 |
| 58 | 0.362637 | 0.239130 | 0.750000 | 0.910355 |
| 59 | 0.368715 | 0.244444 | 0.750000 | 0.912674 |
| 60 | 0.375691 | 0.248175 | 0.772727 | 0.912674 |
| 61 | 0.378378 | 0.248227 | 0.795455 | 0.911128 |
| 62 | 0.333333 | 0.218310 | 0.704545 | 0.904173 |
| 63 | 0.349206 | 0.227586 | 0.750000 | 0.904946 |
| 64 | 0.353591 | 0.233577 | 0.727273 | 0.909583 |
| 65 | 0.364641 | 0.240876 | 0.750000 | 0.911128 |
| 66 | 0.362637 | 0.239130 | 0.750000 | 0.910355 |
| 67 | 0.380435 | 0.250000 | 0.795455 | 0.911901 |
| 68 | 0.378378 | 0.248227 | 0.795455 | 0.911128 |
| 69 | 0.395604 | 0.260870 | 0.818182 | 0.914992 |
| 70 | 0.390805 | 0.261538 | 0.772727 | 0.918083 |
| 71 | 0.386740 | 0.255474 | 0.795455 | 0.914219 |
| 72 | 0.385027 | 0.251748 | 0.818182 | 0.911128 |
| 73 | 0.400000 | 0.262411 | 0.840909 | 0.914219 |
# Storing Xtrain set and ytrain set for GradientBoostingClassifier model
X_train=data_rf.drop("Bankrupt?",axis=1)
y_train=data_rf["Bankrupt?"]
# Scalining the data
scaller=StandardScaler()
X_train_=scaller.fit_transform(X_train)
X_test_=scaller.transform(X_test)
# OverSampling the data
X_train_,y_train_=ADASYN(random_state=42,n_neighbors=5,n_jobs=-1).fit_resample(X_train_,y_train)
# Storing the best parameters for tuning GradientBoostingClassifier model
param_grid = {
"loss":["deviance"],
"learning_rate": [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2],
"min_samples_split": np.linspace(0.1, 0.5, 12),
"min_samples_leaf": np.linspace(0.1, 0.5, 12),
"max_depth":[3,5,8],
"max_features":["log2","sqrt"],
"criterion": ["friedman_mse", "mae"],
"subsample":[0.5, 0.618, 0.8, 0.85, 0.9, 0.95, 1.0],
"n_estimators":[10,20,30,40,80,100,200]
}
# Initializing the Random Forest model
model=GradientBoostingClassifier()
# Initializing the Randomised Search CV object
grid=RandomizedSearchCV(estimator=model,param_distributions= param_grid
,cv=5,verbose=10,
scoring='neg_mean_absolute_error')
# Fitting the model
grid.fit(X_train_,y_train_)
print(grid.best_params_)
# Storing the best parameters for Random Forest model
model=grid.best_estimator_
# Printing the best parameters for Random Forest model
print_metrics(predictions=model.predict(X_test_))
Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV 1/5; 1/10] START criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0
[CV 1/5; 1/10] END criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0;, score=-0.217 total time= 18.0s
[CV 2/5; 1/10] START criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0
[CV 2/5; 1/10] END criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0;, score=-0.146 total time= 14.0s
[CV 3/5; 1/10] START criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0
[CV 3/5; 1/10] END criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0;, score=-0.139 total time= 15.8s
[CV 4/5; 1/10] START criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0
[CV 4/5; 1/10] END criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0;, score=-0.196 total time= 14.1s
[CV 5/5; 1/10] START criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0
[CV 5/5; 1/10] END criterion=mae, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.390909090909091, min_samples_split=0.5, n_estimators=40, subsample=1.0;, score=-0.147 total time= 17.1s
[CV 1/5; 2/10] START criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9
[CV 1/5; 2/10] END criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9;, score=-0.183 total time= 9.4s
[CV 2/5; 2/10] START criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9
[CV 2/5; 2/10] END criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9;, score=-0.135 total time= 8.5s
[CV 3/5; 2/10] START criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9
[CV 3/5; 2/10] END criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9;, score=-0.156 total time= 8.9s
[CV 4/5; 2/10] START criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9
[CV 4/5; 2/10] END criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9;, score=-0.159 total time= 8.9s
[CV 5/5; 2/10] START criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9
[CV 5/5; 2/10] END criterion=mae, learning_rate=0.2, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.17272727272727273, min_samples_split=0.13636363636363638, n_estimators=20, subsample=0.9;, score=-0.147 total time= 8.4s
[CV 1/5; 3/10] START criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8
[CV 1/5; 3/10] END criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8;, score=-0.262 total time= 2.5s
[CV 2/5; 3/10] START criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8
[CV 2/5; 3/10] END criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8;, score=-0.169 total time= 3.3s
[CV 3/5; 3/10] START criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8
[CV 3/5; 3/10] END criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8;, score=-0.211 total time= 3.0s
[CV 4/5; 3/10] START criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8
[CV 4/5; 3/10] END criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8;, score=-0.203 total time= 2.7s
[CV 5/5; 3/10] START criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8
[CV 5/5; 3/10] END criterion=mae, learning_rate=0.025, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.2090909090909091, min_samples_split=0.42727272727272736, n_estimators=10, subsample=0.8;, score=-0.191 total time= 2.9s
[CV 1/5; 4/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9
[CV 1/5; 4/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9;, score=-0.498 total time= 0.0s
[CV 2/5; 4/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9
[CV 2/5; 4/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9;, score=-0.498 total time= 0.0s
[CV 3/5; 4/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9
[CV 3/5; 4/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9;, score=-0.498 total time= 0.0s
[CV 4/5; 4/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9
[CV 4/5; 4/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9;, score=-0.498 total time= 0.0s
[CV 5/5; 4/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9
[CV 5/5; 4/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=sqrt, min_samples_leaf=0.5, min_samples_split=0.5, n_estimators=40, subsample=0.9;, score=-0.498 total time= 0.0s
[CV 1/5; 5/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95
[CV 1/5; 5/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95;, score=-0.159 total time= 0.4s
[CV 2/5; 5/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95
[CV 2/5; 5/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95;, score=-0.120 total time= 0.4s
[CV 3/5; 5/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95
[CV 3/5; 5/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95;, score=-0.143 total time= 0.4s
[CV 4/5; 5/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95
[CV 4/5; 5/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95;, score=-0.193 total time= 0.5s
[CV 5/5; 5/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95
[CV 5/5; 5/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.42727272727272736, min_samples_split=0.17272727272727273, n_estimators=100, subsample=0.95;, score=-0.145 total time= 0.4s
[CV 1/5; 6/10] START criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0
[CV 1/5; 6/10] END criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0;, score=-0.147 total time= 0.4s
[CV 2/5; 6/10] START criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0
[CV 2/5; 6/10] END criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0;, score=-0.101 total time= 0.4s
[CV 3/5; 6/10] START criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0
[CV 3/5; 6/10] END criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0;, score=-0.134 total time= 0.4s
[CV 4/5; 6/10] START criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0
[CV 4/5; 6/10] END criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0;, score=-0.147 total time= 0.4s
[CV 5/5; 6/10] START criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0
[CV 5/5; 6/10] END criterion=friedman_mse, learning_rate=0.15, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.46363636363636374, min_samples_split=0.24545454545454548, n_estimators=100, subsample=1.0;, score=-0.120 total time= 0.4s
[CV 1/5; 7/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85
[CV 1/5; 7/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85;, score=-0.153 total time= 0.3s
[CV 2/5; 7/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85
[CV 2/5; 7/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85;, score=-0.102 total time= 0.3s
[CV 3/5; 7/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85
[CV 3/5; 7/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85;, score=-0.139 total time= 0.4s
[CV 4/5; 7/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85
[CV 4/5; 7/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85;, score=-0.161 total time= 0.4s
[CV 5/5; 7/10] START criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85
[CV 5/5; 7/10] END criterion=friedman_mse, learning_rate=0.075, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.31818181818181823, min_samples_split=0.24545454545454548, n_estimators=80, subsample=0.85;, score=-0.137 total time= 0.5s
[CV 1/5; 8/10] START criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618
[CV 1/5; 8/10] END criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618;, score=-0.183 total time= 0.3s
[CV 2/5; 8/10] START criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618
[CV 2/5; 8/10] END criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618;, score=-0.115 total time= 0.2s
[CV 3/5; 8/10] START criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618
[CV 3/5; 8/10] END criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618;, score=-0.153 total time= 0.1s
[CV 4/5; 8/10] START criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618
[CV 4/5; 8/10] END criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618;, score=-0.190 total time= 0.1s
[CV 5/5; 8/10] START criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618
[CV 5/5; 8/10] END criterion=friedman_mse, learning_rate=0.1, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.390909090909091, n_estimators=30, subsample=0.618;, score=-0.140 total time= 0.1s
[CV 1/5; 9/10] START criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0
[CV 1/5; 9/10] END criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0;, score=-0.200 total time= 21.2s
[CV 2/5; 9/10] START criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0
[CV 2/5; 9/10] END criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0;, score=-0.116 total time= 15.8s
[CV 3/5; 9/10] START criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0
[CV 3/5; 9/10] END criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0;, score=-0.151 total time= 18.8s
[CV 4/5; 9/10] START criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0
[CV 4/5; 9/10] END criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0;, score=-0.173 total time= 15.8s
[CV 5/5; 9/10] START criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0
[CV 5/5; 9/10] END criterion=mae, learning_rate=0.15, loss=deviance, max_depth=5, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.28181818181818186, n_estimators=30, subsample=1.0;, score=-0.167 total time= 15.1s
[CV 1/5; 10/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0
[CV 1/5; 10/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0;, score=-0.225 total time= 0.0s
[CV 2/5; 10/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0
[CV 2/5; 10/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0;, score=-0.144 total time= 0.0s
[CV 3/5; 10/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0
[CV 3/5; 10/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0;, score=-0.161 total time= 0.0s
[CV 4/5; 10/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0
[CV 4/5; 10/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0;, score=-0.172 total time= 0.0s
[CV 5/5; 10/10] START criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0
[CV 5/5; 10/10] END criterion=friedman_mse, learning_rate=0.025, loss=deviance, max_depth=3, max_features=log2, min_samples_leaf=0.24545454545454548, min_samples_split=0.2090909090909091, n_estimators=10, subsample=1.0;, score=-0.180 total time= 0.0s
{'subsample': 1.0, 'n_estimators': 100, 'min_samples_split': 0.24545454545454548, 'min_samples_leaf': 0.46363636363636374, 'max_features': 'log2', 'max_depth': 3, 'loss': 'deviance', 'learning_rate': 0.15, 'criterion': 'friedman_mse'}
0.28671328671328666 f1score
0.8423493044822257 accuracy
0.16942148760330578 precision
0.9318181818181818 recall
precision recall f1-score support
0 1.00 0.84 0.91 1250
1 0.17 0.93 0.29 44
accuracy 0.84 1294
macro avg 0.58 0.89 0.60 1294
weighted avg 0.97 0.84 0.89 1294
[[1049 201]
[ 3 41]]
{'f1_score': 0.28671328671328666,
'precision': 0.16942148760330578,
'recall': 0.9318181818181818,
'accuracy': 0.8423493044822257}
1.Feature selection method
2. IQR scale
3. Model Parameter Tuning
a. loss:["deviance"]
b. learning_rate: [0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2]
c. min_samples_split: np.linspace(0.1, 0.5, 12)
d. min_samples_leaf: np.linspace(0.1, 0.5, 12)
e. max_depth:[3,5,8],
f. max_features:["log2","sqrt"],
g. criterion: ["friedman_mse", "mae"],
h. subsample:[0.5, 0.618, 0.8, 0.85, 0.9, 0.95, 1.0]
i. n_estimators:[10,20,30,40,80,100,200]
Score of Gradient Boost tuned model is by
1.Feature selection method
2. IQR scale
Parameters considered for tuning model:
1. XGBoost
2. Gradient Boost
3. Random Forest
We will consider two models
Random Forest
Why Not Gradient Descent Because Gradient Boost have very good recall value but the Precision Recall trade-off is unfair
Best Parameters
Best Parameters
# Data _suiteable for XGBoost Model
IQR_scale_xgb=3.8
data_xgb_=impute_outliers(bank_data_stage_2,IQR_scale_xgb,list(ascendingly_order_outlier_occr_feat[:33]))
#
# Storing the first 52 ranked features
features_for_XgBoost_=list(feat_score['columns'].values[:52])
# Storing over Sampling method ADASYN
over_sampling_method=ADASYN(random_state=42,n_neighbors=5,n_jobs=-1)
# model
xgboost=XGBClassifier()
xgb_model_metrics,xgb_model,scalled_test_set_xgb=get_model_performance(xgboost,data_xgb_,
over_sampling_method,
display_report=True,
custom_featrues=features_for_XgBoost_,
return_trained_model=True)
y_pred_proba_xgb=xgb_model.predict_proba(scalled_test_set_xgb)[:,1]
precision_xgb, recall_xgb, _xgb = precision_recall_curve(y_test, y_pred_proba_xgb)
fpr_xgb, tpr_xgb, _xgb = roc_curve(y_test, y_pred_proba_xgb, pos_label=1)
print("AUC SCORE OF XGBOOST",roc_auc_score(y_test,y_pred_proba_xgb))
[14:28:34] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.5.1/src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
0.5454545454545454 f1score
0.9574961360123647 accuracy
0.42857142857142855 precision
0.75 recall
precision recall f1-score support
0 0.99 0.96 0.98 1250
1 0.43 0.75 0.55 44
accuracy 0.96 1294
macro avg 0.71 0.86 0.76 1294
weighted avg 0.97 0.96 0.96 1294
[[1206 44]
[ 11 33]]
AUC SCORE OF XGBOOST 0.9434727272727272
IQR_scale_rf=2.5
data_rf_=impute_outliers(bank_data_stage_2,IQR_scale,list(ascendingly_order_outlier_occr_feat[:33]))
features_for_Rf_=list(feat_score['columns'].values[:69])
rf_model_metrics,rf_model,scalled_test_set_rf=get_model_performance(RandomForestClassifier(),
data_rf,ADASYN(random_state=42,n_neighbors=5,n_jobs=-1)
,display_report=True,
custom_featrues=features_for_Rf_,
return_trained_model=True)
y_pred_proba_rf=rf_model.predict_proba(scalled_test_set_rf)[:,1]
precision_rf, recall_rf, _rf = precision_recall_curve(y_test, y_pred_proba_rf)
fpr_rf, tpr_rf, _rf = roc_curve(y_test, y_pred_proba_rf, pos_label=1)
print("AUC SCORE OF Random Forest",roc_auc_score(y_test,y_pred_proba_rf))
0.4680851063829786 f1score
0.9420401854714064 accuracy
0.3402061855670103 precision
0.75 recall
precision recall f1-score support
0 0.99 0.95 0.97 1250
1 0.34 0.75 0.47 44
accuracy 0.94 1294
macro avg 0.67 0.85 0.72 1294
weighted avg 0.97 0.94 0.95 1294
[[1186 64]
[ 11 33]]
AUC SCORE OF Random Forest 0.9535636363636364
1. XGBoost
2. Random Forest
plt.style.use('seaborn')
# plot roc curves
plt.plot(fpr_xgb, tpr_xgb, linestyle='--',color='orange', label='XGBoost')
plt.plot(fpr_rf, tpr_rf, linestyle='--',color='green', label='Random Forest')
# title
plt.title('ROC curve')
# x label
plt.xlabel('False Positive Rate')
# y label
plt.ylabel('True Positive rate')
plt.legend(loc='best')
plt.savefig('ROC',dpi=300)
plt.show();
1. XGBoost
2. Random Forest
plt.figure(figsize=(10,5),dpi=200)
plt.plot(precision_xgb, recall_xgb,color='purple',label="XGBoost")
plt.plot(precision_rf,recall_rf, color='red',label="Random Forest")
#add axis labels to plot
plt.title('Precision-Recall Curve')
plt.xlabel('Precision')
plt.ylabel('Recall')
# Highlighting the best part of curve
plt.plot(0.3402,0.75,label="Best Random Forest Precision Recall Trade off", markersize=15, markeredgecolor="red", markerfacecolor="green",marker="o",)
plt.plot(0.42857,0.75,label="Best XGBoost Precision Recall Trade off", markersize=15, markeredgecolor="red", markerfacecolor="black",marker="o",)
plt.legend()
plt.savefig('PRC',dpi=300)
plt.show()
The Critical situation is that company getting bankrupted i.e it might be risk for stake holders and other investors so we need to max try that our model not to mis-classify the Minority class i.e company being bankrupted
Situation 1 If a Majority class i.e company not being bankrupt is misclassified as bankrupt, the investors or stake holders might not select that mis-classified company even though in reality if company dint get bankrupted, This will also be a loss but not as much as Situation-2
Situation-2 - This situation is a nightmare for people who put lots of money on the company (stakeholders or investors) i.e If a minority class that is company being bankrupted is been mis-classified as not being bankrupt then in case if one invested money on that company and in reality the company gets bankrupt this would incur a lot of money loss and time loss as it is a Failure with total wrong deviation, Yes Obviously its a very bad situation compared to Situation-1
## Recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant actually were relevant.
Models: XGBoost and RandomForest gives best recalls about 75% classifying 75 % of company being bankrupted